Skip to main content
Ben Nadel at cf.Objective() 2012 (Minneapolis, MN) with: Mark Drew
Ben Nadel at cf.Objective() 2012 (Minneapolis, MN) with: Mark Drew ( @markdrew )

Using LATERAL Joins To Get "Top N" Records From Each Group In MySQL 8.0.14

By on
Tags:

One type of SQL query that has been particularly hard, historically, is getting the "Top N" rows for each group in a MySQL aggregation. When MySQL added LATERAL joins in 8.0.14, however, they opened the door for some simple but rather powerful query techniques. A few months ago, I looked at using LATERAL joins to gather row-specific aggregates. In that post, I used traditional aggregate functions like COUNT() and MAX(). But, as I recently demonstrated, MySQL's JSON aggregates (5.7.22+) can be nested for some pretty exciting outcomes. In this post, I want to look at combining LATERAL joins with JSON aggregates to read the "Top N" rows from a GROUP BY query.

To demonstrate this, in the context of this blog, I'm going to craft a SQL query that gets the oldest 10 members; and, for each member, I want to get the newest 5 comments. Getting the oldest 10 members is simple and has been possible in SQL since the dawn of time:

SELECT
	m.id,
	m.name
FROM
	member m
WHERE
	m.id <= 10

In this case, I'm leaning on the fact that my id is an AUTO_INCREMENT column. Which means, the first 10 id values in the table represent the oldest members in the table.

ASIDE: In a simple query like this, I could have done ORDER BY m.id DSEC LIMIT 10 to truly get the oldest 10 members. But, using this technique blows up when you start joining to other tables.

Now, I'm going to add a LATERAL derived join table that will get the most recent 5 comments for each member. Since the LATERAL sub-query is executed for each row of the outer table, it means that our LATERAL SQL can reference columns in the outer row. In this case, we're going to match on m.id:

SELECT
	m.id,
	m.name
FROM
	member m

-- By using a LATERAL JOIN (MySQL 8+), we can create a derived table PER EACH ROW of the
-- outer table. This per-row derived table is made available within the SELECT clause.
INNER JOIN LATERAL
	(

		-- Since this lateral join derived table is being calculated per row, we can
		-- easily grab the "TOP N" corresponding records (using LIMIT) for each row in the
		-- outer table.
		SELECT
			c.id,
			c.createdAt
		FROM
			blog_comment c
		WHERE
			c.memberID = m.id -- Per-row join condition.
		ORDER BY
			c.id DESC
		LIMIT
			5

	) AS recent

WHERE
	m.id <= 10

As you can see here, the LATERAL join allows us to use a per-row join condition:

WHERE c.memberID = m.id

... which, in turn, allows us to gather unique blog_comment records for each of the member records.

Of course, this INNER JOIN LATERAL still works like any other INNER JOIN which means that the result of this SQL query is a cross product of the two tables. This gives us way more rows that we wanted:

MySQL recordset showing that the LATERAL join returns 5 rows of comments per 1 row of members, creating a cross-product between the two tables.

As you can see, we end up with "Ben Nadel" 5 times in this resultset since we get the cross product of each member row and the (at most) 5 comments that each member has left on the blog.

To fix this, we can now GROUP BY the outer row. In some SQL engines, your GROUP BY clause has to contain all the columns that you want to group. However, one of the very nice things in the MySQL-specific syntax is that you only need to include just one of the columns and MySQL figures it out. In this case, we we are going to group by the member id:

SELECT
	m.id,
	m.name
FROM
	member m

-- By using a LATERAL JOIN (MySQL 8+), we can create a derived table PER EACH ROW of the
-- outer table. This per-row derived table is made available within the SELECT clause.
INNER JOIN LATERAL
	(

		-- Since this lateral join derived table is being calculated per row, we can
		-- easily grab the "TOP N" corresponding records (using LIMIT) for each row in the
		-- outer table.
		SELECT
			c.id,
			c.createdAt
		FROM
			blog_comment c
		WHERE
			c.memberID = m.id -- Per-row join condition.
		ORDER BY
			c.id DESC
		LIMIT
			5

	) AS recent

WHERE
	m.id <= 10

-- To reduce the cross product of the member table and the blog_comment table, we can
-- group the results by the member.
GROUP BY
	m.id

Now, when we run this, with the GROUP BY m.id, we get the following MySQL results:

MySQL recordset showing that the added GROUP BY has collapsed the LATERAL join cross-product resulting in a single row per member.

As you can see, we're back down to a single record per member. The "Top N" comments that we generated in our LATERAL join have been collapsed. In order to extract the comment information, we're going to use MySQL's JSON aggregate functions. The JSON_ARRAYAGG() function allows us to compose grouped expressions into a JSON array. And, the JSON_OBJECT() function allows us to aggregate row columns as the expression being composed into the JSON array:

JSON_ARRAYAGG(
	-- Applied to each row in the LATERAL derived table.
	JSON_OBJECT(
		'id', recent.id,
		'createdAt', recent.createdAt
	)
) AS comments

Here, the JSON_OBJECT() function is being applied to each row in the grouping. Meaning, we're looking at each blog_comment row in the LATERAL derived table and we're constructing a JSON object. Then, the JSON_ARRAYAGG() function takes those JSON objects and composes them into a single array per member record.

The whole query looks like this:

SELECT
	m.id,
	m.name,

	-- Since we performed a GROUP BY on the outer table, our LATER JOIN derived table is
	-- now available for aggregation.
	COUNT( * ) AS commentCount,
	-- Using the JSON functions (MySQL 5.7.22+), we can collapse the "TOP N" rows for
	-- each outer row into a JSON payload (array of objects).
	JSON_ARRAYAGG(
		-- Applied to each row in the LATERAL derived table.
		JSON_OBJECT(
			'id', recent.id,
			'createdAt', recent.createdAt
		)
	) AS comments
FROM
	member m

-- By using a LATERAL JOIN (MySQL 8+), we can create a derived table PER EACH ROW of the
-- outer table. This per-row derived table is made available within the SELECT clause.
INNER JOIN LATERAL
	(

		-- Since this lateral join derived table is being calculated per row, we can
		-- easily grab the "TOP N" corresponding records (using LIMIT) for each row in the
		-- outer table.
		SELECT
			c.id,
			c.createdAt
		FROM
			blog_comment c
		WHERE
			c.memberID = m.id -- Per row join condition.
		ORDER BY
			c.id DESC
		LIMIT
			5

	) AS recent

WHERE
	m.id <= 10
GROUP BY
	m.id
;

And, when we run this in MySQL 8.0.14+, we get the following output:

MySQL recordset showing that the added JSON_ARRAYAGG() and JSON_OBJECT() functions have exposed the collapsed LATERAL join records as a JSON payload.

As you can see, the JSON_ARRAYAGG() and JSON_OBJECT() functions have taken the collapsed / grouped LATERAL join records and exposed them as a JSON payload in the MySQL recordset. This has allowed us to get the "Top N" comments for each member in the group. Of course, in your application code you'll have to deserialize the JSON payload to get the comments as an array; however, that should be a capability natively built into your application runtime.

Isn't SQL just thrilling! Every now and then, I'll hear someone diminish SQL as something that people "have" to use. Forget that! I love SQL. And, I love that the SQL engines are constantly adding new and groovy functionality.

Performance Considerations

You may be looking at this SQL and thinking to yourself, "That can't be fast!" And, it's likely not as fast as some less complex queries. But, if we run an EXPLAIN on the SQL query we can see that it is using the PRIMARY key index for the member table and the IX_by_member index on the blog_comment table. Yes, it has to do some filesort operations as well as something called Rematerialize (which I've never seen before); but, I think you'll find that this is actually quite optimized.

Of course, with all things, you just have to test it in your environment with your particular volume of data and index configurations.

Want to use code from this post? Check out the license.

Reader Comments

15,377 Comments

@Chris,

SQL got some pretty nifty stuff in it! I really love the idea of the LATERAL join; though, I haven't really used it in a production setting yet. Most of the stuff on this blog is pretty straightforward (under the hood).

10 Comments

As regards any 'speed' issues...

The alternative to doing it in one query (and the way most devs would approach it) would one db call to get the members and a 10x loop to get the 5 top messages. That's 11 times the number of (admittedly simpler) query parsing/optimisations and 11 lots of network latency. That one query is going to have to optimise pretty badly...

You're right about people diminishing SQL as something that people "have" to use (or treating a db as just a 'dumb bucket' for data). I don't think it's a coincidence that nearly every time I come across a non-performant application, it's sub-optimal db use that's the main culprit.

15,377 Comments

@Ian,

You touched a nerve with me ๐Ÿคฃ I think it was on the Shop Talk podcast, or maybe it was CodePen Radio - on one of them, they were talking about converting code from Ruby on Rails over to Golang. It was some migration code that they were changing. Anyway, months ago, when this first came up, they were saying how a process that ran in Ruby took like 3-days to complete. And then when they rewrote it in Golang, it took like 15-minutes or something. And, at the time, they used that as evidence to say how much better the language was.

Anyway, months later, in a different conversation, they talked about how after it was all said-and-done, they realize that the big change in the code conversion was a change to the way they were doing the database queries. That is was the DB interactions that were making the Ruby code so slow... not Ruby itself.

Of course, cut to me basically yelling into the void that of course it was the database!! ๐Ÿ˜‚

Post A Comment — I'd Love To Hear From You!

Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.
I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel