Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Generating SQL INSERT Statements From SQL SELECT Statements In MySQL 5.7.32

By Ben Nadel on
Tags: SQL

I've spent the last 2-days slicing-and-dicing data for one of InVision's enterprise clients; and, I'm as happy a pig in slop! As I've said before, data is my happy place. Few things give me as much joy as writing SQL queries - as bending relational database tables to my will. We have a client that accidentally evicted close to a thousand users out of their private-cloud instance. And, as a show of good faith, I've been working with them to programmatically re-create tens-of-thousands of records so that they can be confident their users are back online, complete with access to all the appropriate documents. As part of this collaboration, I've been using SQL SELECT statements to generate tens-of-thousands of SQL INSERT statements in MySQL 5.7.32. This approach worked like a charm; and, I thought it would make for a fun demo.

A few years ago, I looked at serializing SQL records as INSERT statements; but, that approach used Lucee CFML as the underlying facilitator which gave me a lot more programmatic flexibility. In this most recent case, I didn't have that luxury: I was granted temporary access to the client's private database for the purposes of this remediation; and could only use the tooling that the MySQL database offered.

And so, what I ended up doing was crafting a bunch of complex SELECT statements with various INNER JOIN and LEFT OUTER JOIN relationships that identified the missing data. And, once I was confident that I had the right data, I swapped out the columns of the SELECT statement with a single CONCAT() operation that generated an INSERT statement for each record in the original SELECT.

To get a sense of what I mean, let's imagine that we have a membership table that stores the transient relationship between a user and an organization. The membership table has both a startedAt and an endedAt column to denote the time period in which the user was part of the organization.

CAUTION: I'm making up these table names and structures for this blog post. Which means that I'm not actually running any of the SQL in the following code snippets. Please forgive any obvious syntax errors in my query structure.

In order to locate users that were accidentally evicted from their parent organization, I would need to find membership records that were terminated during a particular window. And, limit those membership records to the ones that do not correlate with an active, current membership (since some users were already added back to the system manually):

-- Define the WINDOW in which users were evicted. This will narrow down the set of
-- records that we need to work with. And, since the database runs in UTC time, I want to
-- convert from EST to UTC so that I can define my dates using my local timezone (EST).
-- This will help maintain my sanity (and align my SQL with the conversations that I've
-- been having with the client, thereby reducing the the chances that I make am mistake
-- during my data manipulation).
SET @startedAt = CONVERT_TZ( '2021-09-17 05:00:00', 'EST', 'UTC' );
SET @endedAt = CONVERT_TZ( '2021-09-17 06:00:00', 'EST', 'UTC' );

-- Adjust for daylight saving time.
SET @startedAt = DATE_ADD( @startedAt, INTERVAL -1 HOUR );
SET @endedAt = DATE_ADD( @endedAt, INTERVAL -1 HOUR );

-- Lock down to records own by the organization in question.
SET @organizationID = 12345;

-- Locate users that HAD a membership to the organization (lost during the given
-- time-frame); but, which do NOT CURRENTLY have a membership to the organization.
SELECT
	u.id,
	u.name,
	u.email,
	m.startedAt,
	m.endedAt
FROM
	membership m
INNER JOIN
	user u
ON
	-- Locate the membership records that ended during our window.
	(
			m.organizationID = @organizationID
		AND
			m.endedAt BETWEEN @startedAt AND @endedAt
		AND
			u.id = m.userID
	)
LEFT OUTER JOIN
	membership mActive
ON
	(
			mActive.organizationID = m.organizationID -- Same organization.
		AND
			mActive.userID = m.userID -- Same user.
		AND
			mActive.endedAt IS NULL -- ACTIVE membership to organization.
	)
WHERE
	mActive.id IS NULL -- Where there is no matching active membership.
ORDER BY
	u.name ASC,
	u.email ASC
;

You know SQL is exciting stuff when you can JOIN a table back to itself! In this case, we're joining the membership table back to the membership table in order to find correlation between ended records and active records. And, in this case, we're excluding results where there is an active record for the current user/organization relationship.

Now, once I was able to identify the right records and run some sanity checks, I then took the same query and updated the SELECT statement to generate an INSERT statement using the CONCAT() function. Essentially, I had to hand-craft an INSERT INTO ... SET statement. I love this MySQL-specific syntax because it collocates the column names right next to the column values:

-- Define the WINDOW in which users were evicted. This will narrow down the set of
-- records that we need to work with. And, since the database runs in UTC time, I want to
-- convert from EST to UTC so that I can define my dates using my local timezone (EST).
-- This will help maintain my sanity (and align my SQL with the conversations that I've
-- been having with the client, thereby reducing the the chances that I make am mistake
-- during my data manipulation).
SET @startedAt = CONVERT_TZ( '2021-09-17 05:00:00', 'EST', 'UTC' );
SET @endedAt = CONVERT_TZ( '2021-09-17 06:00:00', 'EST', 'UTC' );

-- Adjust for daylight saving time.
SET @startedAt = DATE_ADD( @startedAt, INTERVAL -1 HOUR );
SET @endedAt = DATE_ADD( @endedAt, INTERVAL -1 HOUR );

-- Lock down to records own by the organization in question.
SET @organizationID = 12345;

-- The date at which the new records should be "started".
SET @now = UTC_TIMESTAMP();

-- Generate INSERT INTO ... SET statements for each of the old, deactivated membership
-- records. Here, we essentially want to "copy" the old record, but set the endedAt
-- column to NULL in order to indicate an active membership.
SELECT
	CONCAT(
		'INSERT INTO ',
			'`membership` '
		'SET ',
			'`organizationID` = ', m.organizationID, ', ',
			'`userID` = ', m.userID, ', ',
			'`roleID` = ', m.roleID, ', ',
			'`startedAt` = ', QUOTE( @now ), ', ',
			'`endedAt` = NULL ',
		';'
	) AS insertStatement
FROM
	membership m
INNER JOIN
	user u
ON
	-- Locate the membership records that ended during our window.
	(
			m.organizationID = @organizationID
		AND
			m.endedAt BETWEEN @startedAt AND @endedAt
		AND
			u.id = m.userID
	)
LEFT OUTER JOIN
	membership mActive
ON
	(
			mActive.organizationID = m.organizationID -- Same organization.
		AND
			mActive.userID = m.userID -- Same user.
		AND
			mActive.endedAt IS NULL -- ACTIVE membership to organization.
	)
WHERE
	mActive.id IS NULL -- Where there is no matching active membership.
ORDER BY
	u.name ASC,
	u.email ASC

-- For my own sanity, I wanted to run these in blocks of 1,000 so that I could sanity
-- check the progress along the way. And, so that I didn't overload the writes on the
-- given database table.
LIMIT
	1000
;

Generating the INSERT statement using the CONCAT() function is a bit tedious and hard to read given the number of commas. But, hopefully you can make-out that I'm taking the column values from the old, ended membership record and copying them into the INESRT statement, using a new startedAt date.

Now, when I run this query, I get a record-set that has a single column that defines the INESRT statements that I need to run. In this case, I'm limiting the query to return 1,000 records at a time, which I was then copy-pasting into another query-window for execution. And, since the original SELECT statement excludes old records that have an associated active record, re-running this query automatically excludes the 1,000 records that I just inserted.

I mean come on, SQL is just about the most thrilling code you can write! How much fun is this? Ain't no party like a database party 'cause a database party don't limit the amount of cool stuff you can do! I'm sure there a myriad of other ways that I could have manipulated this data in MySQL and my database GUI (Graphical User Interface). But, for the volume of data and my particular set of skills, generating INSERT statements from a SELECT statement worked a treat.

Epilogue on INSERT INTO SELECT Statements

One of the query techniques that most databases support is an INSERT INTO SELECT statement wherein you can execute an INSERT based on the results of a SELECT all within a single query. This is a great technique. But, for the purposes of my data manipulation in a production application, it felt both too complicated and too opaque. What I really liked about my approach is that I was able to see - and keep a record of - all the INSERT statements that I executed against the client's database. This gave me peace of mind.



Reader Comments

I've been pasting the results of my SQL into Excel and then using the CONCAT formula to generate the SQL I need (be it INSERT, UPDATE, or sproc call).

Nice tip, Ben! I'll give it try the next time I need to create bulk SQL statements.

Reply to this Comment

@Danilo,

I often times do something very similar to that. I'll take the record set and then copy-paste it into SublimeText, where I can then use the multi-cursor wizardry:

www.bennadel.com/blog/3798-exploring-the-power-of-multiple-cursors-and-selections-in-sublime-text-3.htm

... to transform a series of tab-delimited column values into a set of other SQL statements.

Of course, that works best when the column values are uniformly formatted (ie, not free-form text). And, SublimeText can get really slow when you have like 20K cursors on the page at one time 🤪 that's where generating SQL with SQL really shines - because it scales so nicely.

Reply to this Comment

@Ben,

Looks like I'll need to check out a wider range of editors. My current IDE/Editor Visual Studio 2015 at work doesn't support multiple cursors. I can do multiple line edit with ALT+Mouse, and that's been useful quite a few times. It is limited compared to what I've seen multiple cursors appear to be able to do, but can help with the tab delimited editing like you're describing.

I believe that newer versions of Visual Studio are supposed to be able to do multiple cursor editing, so I should check with out IT to see what our upgrade options are.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.