Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Cyril Hanquez and Hugo Sombreireiro and Reto Aeberli and Steven Peeters and Guust Nieuwenhuis and Aurélien Deleusière

MySQL GROUP_CONCAT() Has Buggy Interaction With UNION Clauses

By Ben Nadel on
Tags: ColdFusion, SQL

So, earlier this week, I blogged about the unfortunate way in which MySQL's GROUP_CONCAT() function will fail silently when you hit its size limit. Well, it turns out that GROUP_CONCAT() is even more unfortunate. After some testing, it looks like GROUP_CONCAT() fails even harder (and more mysteriously) when you use it in conjunction with a UNION or UNION ALL clause.

To see what I mean, I'm going to revamp my previous demo in which I am grouping transaction IDs by day of the week. This time, however, I'm going to run the report twice. The first time, I'm using the same code as earlier; the second time, I'm gathering each day of the week individually, as an aggregate, and then joining all the individual aggregates using UNION ALL clauses:

  • <!---
  • First, let's populate the audit-log for testing. Here, we're doing to store the day
  • of the week and the transaction ID that took place.
  • --->
  • <cfquery name="populate" datasource="testing">
  •  
  • TRUNCATE TABLE audit_log;
  •  
  •  
  • <cfset transactionIndex = 100000 />
  •  
  • <!--- For each day of the week, add 1,000 transactions. --->
  • <cfloop index="dayIndex" from="1" to="7" step="1">
  •  
  • <cfloop index="i" from="1" to="1000" step="1">
  •  
  • INSERT INTO audit_log
  • (
  • dayOfWeek,
  • transactionID
  • ) VALUES (
  • #dayIndex#,
  • #( ++transactionIndex )#
  • );
  •  
  • </cfloop>
  •  
  • </cfloop>
  •  
  • </cfquery>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • Now, let's report on the transactions that have taken place. Grouping the log by the
  • day of the week, we're going to concatenate the transactionIDs into a grouped-list.
  • --->
  • <cfquery name="report" datasource="testing">
  • SELECT
  • dayOfWeek,
  •  
  • <!--- Collapse the transaction IDs into a comma-delimited list. --->
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • GROUP BY
  • dayOfWeek
  • ORDER BY
  • dayOfWeek ASC
  • </cfquery>
  •  
  • <!--- Output report aggregates. In "theory", each day should have 1,000 items. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <br />
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • This time, rather than grouping the entire record-set by day, we're going to select
  • each aggregate individually by day of the week, then UNION these aggregates together.
  • In theory, this should be the exact same result since we're still groupging all
  • transactions by day of the week.
  • --->
  • <cfquery name="report" datasource="testing">
  • (
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = 1 <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • )
  •  
  • <!--- UNION the other 6 days of the week. --->
  • <cfloop index="i" from="2" to="7" step="1">
  •  
  • UNION ALL
  • (
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = #i# <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • )
  •  
  • </cfloop>
  •  
  • ORDER BY
  • dayOfWeek ASC
  • </cfquery>
  •  
  • <!--- Output report aggregates. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <br />
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • As a final sanity check, let's look at just one of the groupings that we used in
  • the previous exammple, to see how it behaves when we don't use a UNION clause.
  • --->
  • <cfquery name="report" datasource="testing">
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = 1 <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • </cfquery>
  •  
  • <!--- Output report aggregates. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>

In theory, these two reports should show the exact same result. Yet, when we run the code, we get the following output:

Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147

Day Of Week (1): 49
Day Of Week (2): 49
Day Of Week (3): 49
Day Of Week (4): 49
Day Of Week (5): 49
Day Of Week (6): 49
Day Of Week (7): 49

Day Of Week (1): 147

As you can see, when we gather each day of the week individually, in the context of a UNION ALL, GROUP_CONCAT() is returning an even shorter list (49 items). Of course, if we do the same thing without a UNION ALL (the final sanity check), we get the expected 147 item truncation.

What the heck is going on here? I feel like I'm taking crazy pills! This makes no sense at all. Not only does GROUP_CONCAT() fail silently, it also doesn't seem to fail consistently. This pretty much eliminates it as a viable option for me.

NOTE: I am running MySQL 5.6.15.




Reader Comments

Question! And this is only I am too lazy right now to try it myself.... Instead of using 1000, would you mind trying 147 and 49, and posting the results of the UNION query?

Reply to this Comment

@Tanyar,

If I do 147, then I still get 147 in the first group (since that's all there is). And, more importantly, I still get 49 in the UNION-based select. So, that's one's still problematic even if I am not truncating my result set.

Reply to this Comment

Oddly I get 147 throughout guess its into mysql settings their though because my version is only a little off yours 5.6.14
Guess that just makes it worse :)

Reply to this Comment

I'm getting 147 throughout using MariaDB 10.0.x (which matches MySQL 5.5/6). I use GROUP_CONCAT quite a bit, including with UNION statements, and it's always seemed pretty reliable.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.