Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with:

MySQL GROUP_CONCAT() Fails Silently When It Hits Its Size Limit

By Ben Nadel on
Tags: ColdFusion, SQL

The other day, we ran into interesting little problem. We were using MySQL's GROUP_CONCAT() function to turn an aggregate list into a comma-delimited list. But things weren't working. No errors - just confusing behavior. After logging a lot of output, we realized that the GROUP_CONCAT() function was truncating its output without raising an exception.

To see what I mean, take a look at this demo. Here, we're going to create a fake transaction log that associates a transaction ID with a day of the week (bear with me on this silly example). Then, we're going to report on those transactions by grouping the data by the day of the week and collapsing the relevant transaction IDs into a single list.

  • <!---
  • First, let's populate the audit-log for testing. Here, we're doing to store the day
  • of the week and the transaction ID that took place.
  •  
  • NOTE: This is a really trite example, but I need _something_ to put in the table.
  • --->
  • <cfquery name="populate" datasource="testing">
  •  
  • TRUNCATE TABLE audit_log;
  •  
  •  
  • <cfset transactionIndex = 100000 />
  •  
  • <!--- For each day of the week, add 1,000 transactions. --->
  • <cfloop index="dayIndex" from="1" to="7" step="1">
  •  
  • <cfloop index="i" from="1" to="1000" step="1">
  •  
  • INSERT INTO audit_log
  • (
  • dayOfWeek,
  • transactionID
  • ) VALUES (
  • #dayIndex#,
  • #( ++transactionIndex )#
  • );
  •  
  • </cfloop>
  •  
  • </cfloop>
  •  
  • </cfquery>
  •  
  • <!---
  • Now, let's report on the transactions that have taken place. Grouping the log by the
  • day of the week, we're going to concatenate the transactionIDs into a grouped-list.
  • --->
  • <cfquery name="report" datasource="testing">
  • SELECT
  • dayOfWeek,
  •  
  • <!--- Collapse the transaction IDs into a comma-delimited list. --->
  • GROUP_CONCAT( transactionID ) AS transactionList
  • FROM
  • audit_log
  • GROUP BY
  • dayOfWeek
  • ORDER BY
  • dayOfWeek ASC
  • </cfquery>
  •  
  • <!---
  • Output report aggregates.
  •  
  • NOTE: Since we added 1,000 transactions for each day (at the top of demo), each
  • listLen() should report 1,000.
  • --->
  • <cfoutput query="report">
  •  
  • Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br />
  •  
  • </cfoutput>

As you can see from the INSERT statements, each day of the week should have 1,000 transaction IDs associated with it. And yet, when we run the above code and output the report, we get the following:

Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147

Here, each day of the week is reporting only 147 transactions, not the full 1,000 that we inserted. The problem is that GROUP_CONCAT() has a native default limit of 1,024 characters (which is further constrained by the maximum size of the packet that MySQL can return). Ultimately, though, the real problem is not that GROUP_CONCAT() has a limit, it's that it failed silently. Instead of throwing a SQL error, it just truncated the value. And, as you can imagine, if the truncation happens mid-transactionID, we're not just dealing with missing data, we're now dealing with false data.

It's easy enough to refactor the algorithms so that they don't use GROUP_CONCAT(). But, I figured I'd share this since we were banging our heads against the wall for a while trying to figure out what was going on. In the long run, I'm not sure that I can think of a great use-case for GROUP_CONCAT(); and these limitations definitely strengthen that perception.




Reader Comments

Hi Ben,

I like group concat but people do go against it. I think because its not in the 'standard' sql.
Its quite neat/convenient in more complicated queries where you want to do a whole load of stuff and get somethings summarised. For example when I know there are only going to be a few records.

SELECT loads,of,fields, Group_concat(student.firstname) as students
from loads of tables
... more sql
loads,of,fields, 'Anne,Bob,Colin'

I know you could do that in different ways but group_concat is convenient and perhaps easier to read? for a small group.

Reply to this Comment

@Adam,

I think it is a cool function. And, when I first learned about it, I was really excited since so much of programming is related to gathering groups of IDs :D I guess my biggest gripe is simply that it didn't throw some sort of error - that is just silently truncated the results :(

Reply to this Comment

I think you're possibly blaming the tool rather than the workman here, Ben. As has been pointed out, it's documented as working the way it does (whether that is ideal or not is another thing). So it's not *failing*. It's doing exactly what it is spec'ed to do.

I see what you're saying here is a failure is *similar* to row-limiting a result set and then being vexed that an exception isn't being thrown if there are indeed more rows that would have been returned if the row-limit wasn't in place.

I would imagine the length-limit is in place because the operation is probably an expensive one, and it's less than idea l to it beyond a point (like limiting list sizes in a WHERE IN clause). It might be worth googling whether it's known to be a function to be avoided for this reason? (I have not done so).

Your follow-up blog article sounds more interesting though.

--
Adam

Reply to this Comment

@Adam,

Totally agree - the function is working exactly as it is documented. My gripe is simply that the implementation seems sub-optimal. Using this function was a poor choice, which we unfortunately didn't catch until it was in production. And, since I've never used it before, it didn't set off any red-flags when I was doing code-review.

Reply to this Comment

I precede all my uses of GROUP_CONCAT with

<cfquery datasource="test">
SET @@group_concat_max_len = 9999999;
</cfquery>

It's a great tool for de-normalising data, such as when populating a search index.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.