The other day, we ran into interesting little problem. We were using MySQL's GROUP_CONCAT() function to turn an aggregate list into a comma-delimited list. But things weren't working. No errors - just confusing behavior. After logging a lot of output, we realized that the GROUP_CONCAT() function was truncating its output without raising an exception.
To see what I mean, take a look at this demo. Here, we're going to create a fake transaction log that associates a transaction ID with a day of the week (bear with me on this silly example). Then, we're going to report on those transactions by grouping the data by the day of the week and collapsing the relevant transaction IDs into a single list.
<!--- First, let's populate the audit-log for testing. Here, we're doing to store the day of the week and the transaction ID that took place. NOTE: This is a really trite example, but I need _something_ to put in the table. ---> <cfquery name="populate" datasource="testing"> TRUNCATE TABLE audit_log; <cfset transactionIndex = 100000 /> <!--- For each day of the week, add 1,000 transactions. ---> <cfloop index="dayIndex" from="1" to="7" step="1"> <cfloop index="i" from="1" to="1000" step="1"> INSERT INTO audit_log ( dayOfWeek, transactionID ) VALUES ( #dayIndex#, #( ++transactionIndex )# ); </cfloop> </cfloop> </cfquery> <!--- Now, let's report on the transactions that have taken place. Grouping the log by the day of the week, we're going to concatenate the transactionIDs into a grouped-list. ---> <cfquery name="report" datasource="testing"> SELECT dayOfWeek, <!--- Collapse the transaction IDs into a comma-delimited list. ---> GROUP_CONCAT( transactionID ) AS transactionList FROM audit_log GROUP BY dayOfWeek ORDER BY dayOfWeek ASC </cfquery> <!--- Output report aggregates. NOTE: Since we added 1,000 transactions for each day (at the top of demo), each listLen() should report 1,000. ---> <cfoutput query="report"> Day Of Week (#report.dayOfWeek#): #listLen( report.transactionList )#<br /> </cfoutput>
As you can see from the INSERT statements, each day of the week should have 1,000 transaction IDs associated with it. And yet, when we run the above code and output the report, we get the following:
Day Of Week (1): 147
Day Of Week (2): 147
Day Of Week (3): 147
Day Of Week (4): 147
Day Of Week (5): 147
Day Of Week (6): 147
Day Of Week (7): 147
Here, each day of the week is reporting only 147 transactions, not the full 1,000 that we inserted. The problem is that GROUP_CONCAT() has a native default limit of 1,024 characters (which is further constrained by the maximum size of the packet that MySQL can return). Ultimately, though, the real problem is not that GROUP_CONCAT() has a limit, it's that it failed silently. Instead of throwing a SQL error, it just truncated the value. And, as you can imagine, if the truncation happens mid-transactionID, we're not just dealing with missing data, we're now dealing with false data.
It's easy enough to refactor the algorithms so that they don't use GROUP_CONCAT(). But, I figured I'd share this since we were banging our heads against the wall for a while trying to figure out what was going on. In the long run, I'm not sure that I can think of a great use-case for GROUP_CONCAT(); and these limitations definitely strengthen that perception.
Want to use code from this post? Check out the license.