Skip to main content
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton ( @thefalken )

Goodbye GROUP_CONCAT(), Hello JSON_ARRAYAGG() And JSON_OBJECTAGG() In MySQL 5.7.32

By on
Tags:

The other day, when aggregating some SQL data for a Customer Success report, I went to look up the semantics for the GROUP_CONCAT() function in MySQL. I rarely use the GROUP_CONCAT() function since I've had several problems with it in the past, from failing silently to it bringing back incorrect data (which I believe was a bug in the version of MySQL I was using at the time). As such, I haven't committed the function signature to memory. That said, when I got to the documentation, I noticed two other methods documented right next to it: JSON_ARRAYAGG() and JSON_OBJECTAGG(). I had never seen these before; but, oh chickens, these MySQL functions seem awesome for group aggregation. I'll definitely be using these instead of GROUP_CONCAT() going forward in my MySQL 5.7.32 reporting.

A while back, I mentioned that we were upgrading to MySQL 5.7.32 at work. And, that I was excited to see that such an upgrade would introduce the JSON (JavaScript Object Notation) column-type and relevant manipulation functions. But somehow, when I was learning about the JSON support in MySQL, I missed the fact two new aggregation functions were added as well:

  • JSON_ARRAYAGG( column )
  • JSON_OBJECTAGG( key_column, value_column )

Just like MIN(), MAX(), and GROUP_CONCAT(), the JSON aggregation functions allow us to extract information from a set of rows that have been collapsed through row-grouping (ex, using GROUP BY). Only, they allow us to extract that information into higher-level data structures, not just delimited strings.

The JSON_ARRAYAGG() function gathers all the values in a given column and returns then in a single, aggregated JSON Array.

The JSON_OBJECTAGG() function allows us to gather data from two columns and returns them in a single, aggregated JSON Object. Within this JSON Object, one column defines the key and one column defines the value. Each row within the grouping generates a key-value pair to be added to the aggregate Object.

With SQL like this, the easiest way to understand it is to see an example. So, let's consider the database schema for my blog. I have "blog entries" and I have "tags" (ex, "ColdFusion", "SQL", "JavaScript"). There exists a many-to-many relationship between blog entries and tags that get codified using a "join table":

  • blog_entry
  • tag
  • blog_entry_tag_jn - imagine a two-column table that does nothing but hold the primary keys from the other two tables in the many-to-many relationship.

Given this high-level schema, let's look at how I might query for blog entries and - for each blog entry record - pull back information about the tags associated with that entry. I'm going both the JSON_ARRAYAGG() and JSON_OBJECTAGG() functions in the same query so that we can compare the two results:

SELECT
	e.id,
	e.name,

	-- Pull back all the tag names a single array. Each tag will be an item
	-- within the array.
	JSON_ARRAYAGG( t.name ) AS tagNames,

	-- Create an object with the schema { tag.id : tag.name }. Each tag ID will
	-- be a key within the object.
	JSON_OBJECTAGG( t.id, t.name ) AS tagIndex
FROM
	blog_entry e
INNER JOIN
	blog_entry_tag_jn jn -- Our many-to-many join table.
ON
	jn.blog_entry_id = e.id
INNER JOIN
	tag t
ON
	t.id = jn.tag_id

-- Since we're GROUPING on the blog entry records, all of the `INNER JOIN` tag
-- information is going to be collapsed. However, we can extract aggregation
-- information about the tags using our JSON functions above!
GROUP BY
	e.id
HAVING
	COUNT( * ) > 1 -- To make the grouping more exciting!
ORDER BY
	e.id DESC
LIMIT
	10

Since we're using a GROUP BY on the blog entries, all of the many-to-many tag information is getting collapsed down into a grouping. However, we can then extract that grouping information on a per-entry-row basis by using the JSON aggregation functions. And, when we run this SQL code, we get the following results:

As you can see, the JSON_ARRAYAGG() function gathers each tag name and pushes it onto a single array. And, the JSON_OBJECTAGG() function gathers each tag id and name and appends them to a single object (using the id column as the key and the name column as the value). How sweet is this!

Now that we have JSON (JavaScript Object Notation) aggregation functions in MySQL 5.7 (added in 5.7.22), I can't think of a reason that I'd want to use the GROUP_CONCAT() function anymore. It seems that having structured data is always better than having a delimited list. But, I did just stumble upon these functions, so maybe there are use-cases I haven't considered yet.

Epilogue on JSON Aggregation Limitations

Since the JSON aggregation functions roll-up many rows into a single value, I have to assume that there are limits to the amount of data that it can return (just as there is with the GROUP_CONCAT() function). But, I'm having trouble finding anything explicitly documented. I assume that the aggregate data is still bound by the max_allowed_packet setting. But, I'm not sure if there are any other settings which may further reduce the amount of data that can be returned in each aggregation. It's just something to keep in mind.

Want to use code from this post? Check out the license.

Reader Comments

205 Comments

Very interesting! I'm not sure I see a use case for the JSON_OBJECTAGG() function, but I could definitely see how JSON_ARRAYAGG() could replace the GROUP_CONCAT().

I use GROUP_CONCAT() to pull back a list of users in my app. I'd love to hyperlink each user such that clicking that user emails them or goes to their profile. I haven't spent much time trying to solve this problem. I've only found a way to return the name (so far). When I tried to return the email to hyperlink each name, I ran into trouble. I don't recall the trouble anymore. That was a while back.

15,674 Comments

@Chris,

I'm still getting my head wrapped around this stuff, but speaking to your user-email problem, I think you could so something like this:

JSON_OBJECTAGG( user.email, user.name )

This would return an object in which the email is the key and the name is the value. So, something like:

{
	"sarah@example.com": "Sarah Smith",
	"jon@example.com": "Jon Smyth",
}

This way, you can pull back both the name and the email, which is what is sounds like you need to link in your example. But, I don't really know your context; and, I only just learned about these functions - so your mileage may vary 😉

8 Comments

This is really exciting, I have been doing a few things with JSON in MySQL 5.7 but assumed most of the JSON functions were only in MySQL 8 so I have been doing everything in ColdFusion/javascript which is super limiting and slow. After looking at the docs, the number of JSON functions available after 5.7.23 are quite extensive.😀

15,674 Comments

@Scott,

Yeah, I was excited to see the JSON functions sneaking in there! I actually just upgraded the MySql engine for this site to 8.x. But, I don't even know what is new there yet - I gotta carve out some time to look at the changelog.

15,674 Comments

@Brad,

Very cool!! I have to say, you're single-handedly making query-of-queries fashionable again 😮 what with your speed improvements and aggregation ideas. I think people forget just how powerful Query objects are 💪

8 Comments

Literally just hit another issue in my code from using group_concat due to length truncation. Glad this post was written so I have an alternative solution. Thanks! 😀

15,674 Comments

@Scott,

I wish there was a :shakes-fist: emoji! We have one at work - Grandpa Simpson shaking fist at cloud (or something). I wonder what it would take to get some custom emoji working here ... eh, probably a bad idea.

But, that's neither here nor there - excited to see that you have a path forward now!

8 Comments

@Ben,

During exploring I got even more excited with using the JSON_OBJECT function that easily lets me generate grouped JSON output:

SELECT  
JSON_OBJECTAGG(id,JSON_OBJECT('name',name,"acres",round(acres,2)))
FROM ...

output serializeJSON:

{"5065": {"name": "335", "acres": 69.02}, "5066": {"name": "336", "acres": 79.19}}
15,674 Comments

@Scott,

That's something I need to play around with more. I know that they exist; but, I haven't really started to use them in any way. I wonder if that would be an interesting way to return data across multiple tables in a JOIN. So, instead of mapping sets of columns for each table, return a JSON object for each table. Meaning, instead of this:

SELECT
	( a.id ) AS a_id,
	( a.name ) AS a_name,
	( b.id ) AS b_id,
	( b.name ) AS b_name
FROM
	table a
INNER JOIN
	table b
ON

... you have something like this:

SELECT
	JSON_OBJECT(
		'id', a.id,
		'name', a.name
	) AS aProps,
	JSON_OBJECT(
		'id', b.id,
		'name', b.name
	) AS bProps
FROM
	table a
INNER JOIN
	table b
ON

... though, looking at it, the later is definitely more noisy. But, something to let soak in the back of my brain.

8 Comments

@Ben,
It depends on the use case, for me I need to use it in javascript so previously I was generating my structs in ColdFusion, now that the entire process is automatically created by the database engine, including typecasting and character escaping (I'm assuming), making the process faster and seamless.

The other wonderful thing in the more recent versions of 5.7 is the JSON column and its ability to query/filter by specific keys. Until then I have been storing things as longtext when needing to store JSON in the database and its a bit nerveracking 😳

15,674 Comments

@Mike,

I haven't personally done any performance comparisons. But, I also haven't used this technique on any volume of data where performance would become a problem.

8 Comments

@Mike,
I would assume GROUP_CONCAT is faster, but just be aware, there's a hard limit on how many results GROUP_CONCAT can return that is not the case in JSON_ARRAYAGG I don't believe

15,674 Comments

@Scott,

Oh man, I've been burned by that hard in the past! We had a SQL query where the GROUP_CONCAT() was just silently truncating values. We were then using that list of truncated values as input into another query ... and imagine that when the truncation happened, it was splitting some INT values in half ... 😱

8 Comments

@Ben,

Yikes 😱, it is definitely a limit that should either not exist or not fail silently. I have had a similar experience where I was sending out text notifications to a list of users and about 20% weren't receiving anything. It was really hard to track down.

1 Comments

Hi Ben,
thanks a lot for your awesome article about JSON on MySQL (and MariaDB)!
I spent many hours before find that informations.
👍👍👍

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel