Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with: Vicky Ryder
Ben Nadel at CFUNITED 2010 (Landsdown, VA) with: Vicky Ryder@fuzie )

MySQL GROUP_CONCAT() Has Buggy Interaction With UNION Clauses

By Ben Nadel on
Tags: ColdFusion, SQL

So, earlier this week, I blogged about the unfortunate way in which MySQL's GROUP_CONCAT() function will fail silently when you hit its size limit. Well, it turns out that GROUP_CONCAT() is even more unfortunate. After some testing, it looks like GROUP_CONCAT() fails even harder (and more mysteriously) when you use it in conjunction with a UNION or UNION ALL clause.

To see what I mean, I'm going to revamp my previous demo in which I am grouping transaction IDs by day of the week. This time, however, I'm going to run the report twice. The first time, I'm using the same code as earlier; the second time, I'm gathering each day of the week individually, as an aggregate, and then joining all the individual aggregates using UNION ALL clauses:

  • <!---
  • First, let's populate the audit-log for testing. Here, we're doing to store the day
  • of the week and the transaction ID that took place.
  • --->
  • <cfquery name="populate" datasource="testing">
  •  
  • TRUNCATE TABLE audit_log;
  •  
  •  
  • <cfset transactionIndex = 100000 />
  •  
  • <!--- For each day of the week, add 1,000 transactions. --->
  • <cfloop index="dayIndex" from="1" to="7" step="1">
  •  
  • <cfloop index="i" from="1" to="1000" step="1">
  •  
  • INSERT INTO audit_log
  • (
  • dayOfWeek,
  • transactionID
  • ) VALUES (
  • #dayIndex#,
  • #( ++transactionIndex )#
  • );
  •  
  • </cfloop>
  •  
  • </cfloop>
  •  
  • </cfquery>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • Now, let's report on the transactions that have taken place. Grouping the log by the
  • day of the week, we're going to concatenate the transactionIDs into a grouped-list.
  • --->
  • <cfquery name="report" datasource="testing">
  • SELECT
  • dayOfWeek,
  •  
  • <!--- Collapse the transaction IDs into a comma-delimited list. --->
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • GROUP BY
  • dayOfWeek
  • ORDER BY
  • dayOfWeek ASC
  • </cfquery>
  •  
  • <!--- Output report aggregates. In "theory", each day should have 1,000 items. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <br />
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • This time, rather than grouping the entire record-set by day, we're going to select
  • each aggregate individually by day of the week, then UNION these aggregates together.
  • In theory, this should be the exact same result since we're still groupging all
  • transactions by day of the week.
  • --->
  • <cfquery name="report" datasource="testing">
  • (
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = 1 <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • )
  •  
  • <!--- UNION the other 6 days of the week. --->
  • <cfloop index="i" from="2" to="7" step="1">
  •  
  • UNION ALL
  • (
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = #i# <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • )
  •  
  • </cfloop>
  •  
  • ORDER BY
  • dayOfWeek ASC
  • </cfquery>
  •  
  • <!--- Output report aggregates. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>
  •  
  •  
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  • <br />
  • <!--- ----------------------------------------------------- --->
  • <!--- ----------------------------------------------------- --->
  •  
  •  
  • <!---
  • As a final sanity check, let's look at just one of the groupings that we used in
  • the previous exammple, to see how it behaves when we don't use a UNION clause.
  • --->
  • <cfquery name="report" datasource="testing">
  • SELECT
  • dayOfWeek,
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • WHERE
  • dayOfWeek = 1 <!--- Get only this day of the week. --->
  • GROUP BY
  • dayOfWeek
  • </cfquery>
  •  
  • <!--- Output report aggregates. --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>

In theory, these two reports should show the exact same result. Yet, when we run the code, we get the following output:

Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147

Day Of Week (1): 49
Day Of Week (2): 49
Day Of Week (3): 49
Day Of Week (4): 49
Day Of Week (5): 49
Day Of Week (6): 49
Day Of Week (7): 49

Day Of Week (1): 147

As you can see, when we gather each day of the week individually, in the context of a UNION ALL, GROUP_CONCAT() is returning an even shorter list (49 items). Of course, if we do the same thing without a UNION ALL (the final sanity check), we get the expected 147 item truncation.

What the heck is going on here? I feel like I'm taking crazy pills! This makes no sense at all. Not only does GROUP_CONCAT() fail silently, it also doesn't seem to fail consistently. This pretty much eliminates it as a viable option for me.

NOTE: I am running MySQL 5.6.15.



Looking For A New Job?

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

Question! And this is only I am too lazy right now to try it myself.... Instead of using 1000, would you mind trying 147 and 49, and posting the results of the UNION query?

@Tanyar,

If I do 147, then I still get 147 in the first group (since that's all there is). And, more importantly, I still get 49 in the UNION-based select. So, that's one's still problematic even if I am not truncating my result set.

Oddly I get 147 throughout guess its into mysql settings their though because my version is only a little off yours 5.6.14
Guess that just makes it worse :)

I'm getting 147 throughout using MariaDB 10.0.x (which matches MySQL 5.5/6). I use GROUP_CONCAT quite a bit, including with UNION statements, and it's always seemed pretty reliable.