Skip to main content
Ben Nadel at the New York Javascript Meetup (Aug. 2010) with: Rebecca Murphey and Adam Sontag
Ben Nadel at the New York Javascript Meetup (Aug. 2010) with: Rebecca Murphey ( @rmurphey ) Adam Sontag ( @ajpiano )

MySQL GROUP_CONCAT() Has Buggy Interaction With UNION Clauses

By on
Tags: ,

So, earlier this week, I blogged about the unfortunate way in which MySQL's GROUP_CONCAT() function will fail silently when you hit its size limit. Well, it turns out that GROUP_CONCAT() is even more unfortunate. After some testing, it looks like GROUP_CONCAT() fails even harder (and more mysteriously) when you use it in conjunction with a UNION or UNION ALL clause.

To see what I mean, I'm going to revamp my previous demo in which I am grouping transaction IDs by day of the week. This time, however, I'm going to run the report twice. The first time, I'm using the same code as earlier; the second time, I'm gathering each day of the week individually, as an aggregate, and then joining all the individual aggregates using UNION ALL clauses:

<!---
	First, let's populate the audit-log for testing. Here, we're doing to store the day
	of the week and the transaction ID that took place.
--->
<cfquery name="populate" datasource="testing">

	TRUNCATE TABLE audit_log;


	<cfset transactionIndex = 100000 />

	<!--- For each day of the week, add 1,000 transactions. --->
	<cfloop index="dayIndex" from="1" to="7" step="1">

		<cfloop index="i" from="1" to="1000" step="1">

			INSERT INTO audit_log
			(
				dayOfWeek,
				transactionID
			) VALUES (
				#dayIndex#,
				#( ++transactionIndex )#
			);

		</cfloop>

	</cfloop>

</cfquery>


<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->


<!---
	Now, let's report on the transactions that have taken place. Grouping the log by the
	day of the week, we're going to concatenate the transactionIDs into a grouped-list.
--->
<cfquery name="report" datasource="testing">
	SELECT
		dayOfWeek,

		<!--- Collapse the transaction IDs into a comma-delimited list. --->
		GROUP_CONCAT( transactionID ) AS transactionList
	FROM
		audit_log
	GROUP BY
		dayOfWeek
	ORDER BY
		dayOfWeek ASC
</cfquery>

<!--- Output report aggregates. In "theory", each day should have 1,000 items. --->
<cfoutput query="report">

	Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />

</cfoutput>


<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<br />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->


<!---
	This time, rather than grouping the entire record-set by day, we're going to select
	each aggregate individually by day of the week, then UNION these aggregates together.
	In theory, this should be the exact same result since we're still groupging all
	transactions by day of the week.
--->
<cfquery name="report" datasource="testing">
		(
			SELECT
				dayOfWeek,
				GROUP_CONCAT( transactionID ) AS transactionList
			FROM
				audit_log
			WHERE
				dayOfWeek = 1 <!--- Get only this day of the week. --->
			GROUP BY
				dayOfWeek
		)

	<!--- UNION the other 6 days of the week. --->
	<cfloop index="i" from="2" to="7" step="1">

		UNION ALL
			(
				SELECT
					dayOfWeek,
					GROUP_CONCAT( transactionID ) AS transactionList
				FROM
					audit_log
				WHERE
					dayOfWeek = #i# <!--- Get only this day of the week. --->
				GROUP BY
					dayOfWeek
			)

	</cfloop>

	ORDER BY
		dayOfWeek ASC
</cfquery>

<!--- Output report aggregates. --->
<cfoutput query="report">

	Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />

</cfoutput>


<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->
<br />
<!--- ----------------------------------------------------- --->
<!--- ----------------------------------------------------- --->


<!---
	As a final sanity check, let's look at just one of the groupings that we used in
	the previous exammple, to see how it behaves when we don't use a UNION clause.
--->
<cfquery name="report" datasource="testing">
	SELECT
		dayOfWeek,
		GROUP_CONCAT( transactionID ) AS transactionList
	FROM
		audit_log
	WHERE
		dayOfWeek = 1 <!--- Get only this day of the week. --->
	GROUP BY
		dayOfWeek
</cfquery>

<!--- Output report aggregates. --->
<cfoutput query="report">

	Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />

</cfoutput>

In theory, these two reports should show the exact same result. Yet, when we run the code, we get the following output:

Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147

Day Of Week (1): 49
Day Of Week (2): 49
Day Of Week (3): 49
Day Of Week (4): 49
Day Of Week (5): 49
Day Of Week (6): 49
Day Of Week (7): 49

Day Of Week (1): 147

As you can see, when we gather each day of the week individually, in the context of a UNION ALL, GROUP_CONCAT() is returning an even shorter list (49 items). Of course, if we do the same thing without a UNION ALL (the final sanity check), we get the expected 147 item truncation.

What the heck is going on here? I feel like I'm taking crazy pills! This makes no sense at all. Not only does GROUP_CONCAT() fail silently, it also doesn't seem to fail consistently. This pretty much eliminates it as a viable option for me.

NOTE: I am running MySQL 5.6.15.

Want to use code from this post? Check out the license.

Reader Comments

2 Comments

Question! And this is only I am too lazy right now to try it myself.... Instead of using 1000, would you mind trying 147 and 49, and posting the results of the UNION query?

15,640 Comments

@Tanyar,

If I do 147, then I still get 147 in the first group (since that's all there is). And, more importantly, I still get 49 in the UNION-based select. So, that's one's still problematic even if I am not truncating my result set.

8 Comments

Oddly I get 147 throughout guess its into mysql settings their though because my version is only a little off yours 5.6.14
Guess that just makes it worse :)

14 Comments

I'm getting 147 throughout using MariaDB 10.0.x (which matches MySQL 5.5/6). I use GROUP_CONCAT quite a bit, including with UNION statements, and it's always seemed pretty reliable.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel