Performing Online ALTER TABLE SQL Migrations Without Any Downtime

By Ben Nadel

Published 2015-07-14 in ColdFusion, SQL — Comments (4)

The other day, I tweeted about how much I love the INSERT IGNORE INTO syntax in MySQL. In response to that tweet, Ken Auenson asked me what kind of a use-case I had for ignoring duplicate records. Typically, I use IGNORE when migrating and / or transforming data. And, since I don't think I've ever really blogged about that before, I thought I might take a quick look at one use-case that has been very helpful to me: performing online ALTER TABLE statements without any downtime.

INSERT IGNORE INTO MySQL syntax is great for data transformations and migrations.

The IGNORE portion of the INSERT statement just says that if the INSERT causes a primary key violation, ignore the error (without inserting the record) and move on with the SQL statement. This is really handy in a migration or a transformation process where you might have to run the same script over and over again and don't want to worry about existing records.

When it comes to [how I execute] online ALTER TABLE statements without any downtime, the IGNORE comes into play because I set up TRIGGERs that automatically replicate rows across two tables. Then, in conjunction with said triggers, I start explicitly copying rows from one table to another. Due to the nature of the triggers and the timing and the possibility of failure in the script itself, there's a chance that I'll try to explicitly copy a record that has already been replicated. As such, I just tell MySQL to ignore the primary key violation and continue on with the migration.

Side Note: Outside of transformation and migration scripts, I don't really have a use-case for the IGNORE syntax. In the workflows of my applications, I typically want SQL to throw an error upon violations because it means that something else is going wrong in my business logic.

When performing the online migration, I have a four-step workflow:

Setup shadow table and triggers.
Migrate existing records.
Run some sanity checks.
Swap the live table and the shadow table.

NOTE: There are some tools that will do this for you from the command-line, like pt-online-schema-change from Percona. However, many of those require MySQL to be running on a *nix platform and I started doing this when I was still running everything on Windows.

Step 1: Setup Shadow Table And Triggers.

When the production table is very large, we cannot simply run an ALTER TABLE statement on it directly. It will lock up the table and, depending on what table it is, may bring your application to a grinding halt. As such, we have to build the desired table structure in parallel with the live table. Then, when two tables are in sync, swap them. The first step is setting up the new structure and configuring triggers to automatically pull new data into the shadow table.

/*
	As a first step, I get the existing structure of the table that I want to
	migrate. This is often just the `SHOW CREATE TABLE friend` executed on the
	production database. Notice, however, that the table is suffixed with "shadow".
*/
CREATE TABLE `friend_shadow` (
	`id` int(1) unsigned NOT NULL AUTO_INCREMENT,
	`name` varchar(30) NOT NULL,
	PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=UTF8;

/*
	Next, I create the ALTER statement that I would like to be running on the
	production data table.
	--
	NOTE: While this could be rolled into the `CREATE TABLE` statement above, I
	like seeing it broken out into the unique set of changes that are going to
	be applied to the table. I feel like this is a great sanity check for both
	the online migration as well as a good basis for any `ALTER` statement that
	developers can run locally where table-locking is not an issue.
*/
ALTER TABLE `friend_shadow`
	ADD COLUMN `isBFF` bit(1) NOT NULL AFTER `name`,
	ADD COLUMN `updatedAt` datetime NOT NULL AFTER `isBFF`,
	ADD INDEX `IX_bestFriend` ( `isBFF` )
;

/*
	Now that we have our shadow table created, we want to set up triggers on
	the target table that we want to migrate. These triggers will make sure
	that any modifications made to the target table will be automatically
	mirrored in the shadow table while we are doing the migration.
	--
	CAUTION: I highly recommend that you remove any existing triggers on the
	target table. Ultimately, we're going to perform some table renames and
	if you have existing triggers, things will get funky like a chicken!
*/

/*
	Since triggers require embedded semi-colons, let's change the active
	delimiter so that the statement parser doesn't get confused by the embedded
	semi-colons.
*/
DELIMITER END_COMPOUND_STATEMENT

/*
	Setup the INSERT trigger. When new records are inserted into our target
	table, we want to automatically insert those records into our shadow table
	with the new columns set to appropriate default values.
*/
CREATE TRIGGER `insert_friend_shadow` AFTER INSERT ON `friend`
FOR EACH ROW
BEGIN

	INSERT INTO `friend_shadow`
	(
		`id`,
		`name`,

		/* Our new columns. */
		`isBFF`,
		`updatedAt`
	) VALUES (
		NEW.id,
		NEW.name,

		/* isBFF. */
		0,

		/* updateAt */
		UTC_TIMESTAMP()
	);

END
END_COMPOUND_STATEMENT /* Our temporary delimiter. */


/*
	Setup the UPDATE trigger. When existing records are updated in our target
	table, we want to automatically pull those updates into the shadow table.
	Don't worry if the rows don't exist yet - if not, they will be taken care
	of during the row-based migration.
*/
CREATE TRIGGER `update_friend_shadow` AFTER UPDATE ON `friend`
FOR EACH ROW
BEGIN

	UPDATE
		`friend_shadow`
	SET
		`name` = NEW.name,

		/* Our new columns (that matter for update). */
		`updatedAt` = UTC_TIMESTAMP()
	WHERE
		`id` = OLD.id
	;

END
END_COMPOUND_STATEMENT /* Our temporary delimiter. */


/*
	Setup the DELETE trigger. When existing records are deleted from our target
	table, we want to automatically remove those records from our shadow table.
*/
CREATE TRIGGER `delete_friend_shadow` AFTER DELETE ON `friend`
FOR EACH ROW
BEGIN

	DELETE FROM
		`friend_shadow`
	WHERE
		id = OLD.id
	;

END
END_COMPOUND_STATEMENT /* Our temporary delimiter. */

/* Restore the normal delimiter. */
DELIMITER ;
/*
	NOTE: The DELIMITER statement seems to fail if there is not some sort of
	whitespace after it, hence this comment.
*/

Step 2: Migrate Existing Records.

Once we have our shadow table setup and we have triggers moving over new modifications, we have to take care of migrating existing records. This is where the INSERT IGNORE INTO comes into play. When migrating existing records from the live table to the shadow table, I don't want to worry about primary key violations. This allows me to stop the script, tweak it, and restart it without having to reset the state of the system.

I'm running my migrations in ColdFusion, but there is nothing ColdFusion specific about this concept.

<!---
	Now that we have shadow table in place and our target table triggers
	implicitly copying new records, we need to start migrating existing
	rows into the shadow table. Due to the size of the target table, we can't
	just INSERT INTO ... SELECT the shadow table - it will lock up the target
	table. As such, we have to run the migration in reasonable blocks.
--->

<!--- Define the basis of the next block of records. --->
<cfparam name="url.nextBlockID" type="numeric" default="0" />

<!---
	This is the ID of the last record in the target table at the time the
	triggers were created. Since triggers will take care of all new rows, we
	know that we only have to migrate up to this record.
--->
<cfset maxBlockID = 123456789 />

<!---
	Determine how many records we can migrate in a single statement.
	--
	NOTE: Typically, I'll start with something reasonable and then live-tweak
	this value as the migration is running. At the same time, I'll watch the
	response times in something like Fusion Reactor to make sure that the table
	isn't locking up and that overall database performance doesn't seem to be
	negatively affected.
--->
<cfset blockSize = 1 />


<!--- Check to see if we are done migrating. --->
<cfif ( url.nextBlockID gt maxBlockID )>

	<strong>Done!</strong> Please proceed to the sanity check phase.
	<cfabort />

<!--- We are still migrating - report progress. --->
<cfelse>

	<cfoutput>
		#numberFormat( url.nextBlockID, "," )# of #numberFormat( maxBlockID, "," )#
		&mdash;
		#numberFormat( ( url.nextBlockID / maxBlockID * 100 ), ",.00" )#%
	</cfoutput>

</cfif>


<!---
	Migrate the next block of records.
	--
	NOTE: I am using the `IGNORE` aspect of the `INSERT` for two reasons:
	First, this will avoid any problems with migrating records that may have
	been automatically migrated by the triggers before this row-based process
	was initiated. And, second, if this script were to fail for some reason,
	I can safely restart it again without having to worry about migrating
	duplicate records.
--->
<cfquery name="migrateBlockToShadowTable" datasource="testing">
	INSERT IGNORE INTO `friend_shadow`
	(
		id,
		name,

		/* New columns. */
		isBFF,
		updatedAt
	) (

		SELECT
			f.id,
			f.name,

			/* isBFF. */
			0,

			/* updatedAt. */
			UTC_TIMESTAMP()
		FROM
			`friend` f

		/*
			We are going to be using the ID (primary key) of the table to quickly
			look up all records within the block. This block may not have an equal
			number of records (due to record deletion). But, it usually runs fast
			enough to not worry about the inefficiencies.
		*/
		WHERE
			f.id >= <cfqueryparam value="#url.nextBlockID#" cfsqltype="cf_sql_integer" />
		AND
			f.id < <cfqueryparam value="#( url.nextBlockID + blockSize )#" cfsqltype="cf_sql_integer" />

	);
</cfquery>


<!--- Move forward by a block-factor. --->
<cfset url.nextBlockID += blockSize />

<!---
	Recursively move onto the next block. I use a small delay here to make sure
	that I don't hammer the server or confuse the browser into thinking it's in
	the middle of a "too many redirects" loop.
	--
	NOTE: I usually start off with a delay of a few seconds so I can monitor the
	server to make sure nothing is adversely affected. Then, once everything seems
	to be fine, I usually live-tweak the delay, dropping it down to 50-milliseconds.
--->
<cfoutput>
<script type="text/javascript">

	setTimeout(
		function() {

			location.href = "#cgi.script_name#?nextBlockID=#url.nextBlockID#&cacheBust=#getTickCount()#";

		},
		5000
	);

</script>
</cfoutput>

Step 3: Run Some Sanity Checks.

Technically, this isn't necessary. If the script is good and the migration is true and the developer is brave, then nothing should have gone wrong. But, migrating tables is perhaps the most terrifying aspect of database administration. So, in order to not vomit on myself, I like to run a few sanity checks. The following sanity checks just help me believe that the shadow table accurately represents the live table plus the altered structure.

/*
	After the row-based migration is complete, I usually like to run a fw sanity
	checks on the shadow table. Really, this isn't necessary; but, this is how I
	sleep well at night.
*/

/*
	The first sanity check is simply making sure that both tables have the same
	number of rows and that their MIN and MAX id values are the same.
*/
SELECT
	/* Row count (the following two columns should match). */
	(

		SELECT
			COUNT( * )
		FROM
			`friend`

	) AS oldCount,
	(

		SELECT
			COUNT( * )
		FROM
			`friend_shadow`

	) AS newCount,


	/* Min ID value (the following two columns should match). */
	(

		SELECT
			MIN( id )
		FROM
			`friend`

	) AS oldMinID,
	(
		SELECT
			MIN( id )
		FROM
			`friend_shadow`

	) AS newMinID,


	/* Max ID value (the following two columns should match). */
	(

		SELECT
			MAX( id )
		FROM
			`friend`

	) AS oldMaxID,
	(
		SELECT
			MAX( id )
		FROM
			`friend_shadow`

	) AS newMaxID
;


/*
	The second sanity check is to get a block or records from each table and
	make sure they are exactly the same, value-wise. Since the table is too big
	to compare in a single statement, I'll usually just spot-check records based
	on a selection of IDs.
*/
SELECT
	ot.id,
	nt.id
FROM
	/* Limit the comparison to the following ID selection. */
	(

		SELECT
			f.id
		FROM
			`friend` f
		WHERE
			f.id > 0 /* Where to start looking. */
		ORDER BY
			f.id ASC
		LIMIT
			10 /* How many rows to compare in one shot. */
		OFFSET
			0 /* Which block to start at. */

	) AS idList
INNER JOIN
	`friend` ot /* ot: Old Table. */
ON
	ot.id = idList.id
LEFT OUTER JOIN
	`friend_shadow` nt /* nt: New Table. */
ON
	(
			nt.id = ot.id
		AND
			nt.name = ot.name

		/*
			NOTE: We are not including our new columns in this comparison since
			they are not relevant to the old table.
		*/
	)

/*
	Only return records where the old table has a record that is NOT reflected
	in the new table - ie, where the new table ID is null.
*/
WHERE
	nt.id IS NULL
;

Step 4: Swap The Live Table And The Shadow Table.

Once I am convinced that the shadow table is ready to go, it's time to swap the two tables and drop the triggers.

/*
	Once the row-based migration has been completed, we know that our shadow table
	has the correct structure and all the live, up-to-date data. At this point, we
	can swap the two tables and drop all the triggers.
	--
	CAUTION: I am not wrapping this in a transaction (BEGIN/COMMIT) since table and
	trigger alterations implicitly commit any existing transaction. That said, I
	have never run into a problem with this action.
	Documentation: https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
*/
RENAME TABLE
	`friend` TO `__friend_pre_migration__`,
	`friend_shadow` TO `friend`
;

DROP TRIGGER IF EXISTS `insert_friend_shadow`;
DROP TRIGGER IF EXISTS `update_friend_shadow`;
DROP TRIGGER IF EXISTS `delete_friend_shadow`;

At this point, your live table now has the migrated table structure, all with no downtime.

I am not a database administrator, so take this with a grain of salt; I learn about databases as I go, when needed. Mostly, I just wanted to show a solid use-case for the INSERT IGNORE INTO syntax. While not very useful [for me personally] in general application development, INSERT IGNORE INTO has been absolutely priceless when it comes to data transformation scripts.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/2865

Reader Comments

Ken Auenson Jul 14, 2015 at 10:35 AM

38 Comments

Ben,
I am always blown away with you attention to detail, and your willingness to spend the time and energy to make posts like this.
You rock!
I have never had to do a live update like this, very neat approach. Thanks again.

Ben Nadel Jul 14, 2015 at 11:12 AM

16,125 Comments

@Ken,

Thanks my man. Doing updates like this NOT fun :) I have a mini panic-attack right before I have to swap out the tables. It's never actually caused a problem... but, it's utterly terrifying every time :(

Frank Henninger Jul 16, 2015 at 6:07 PM

2 Comments

@Ben,

You should be able to do the equivalent in Oracle and SQL Server using the MERGE syntax.

This isn't a completely foolproof online solution though. You do that the small moments in time when your swapping the tables to contend with. Also, don't forget about any key relationships that need moved.

Ben Nadel Jul 20, 2015 at 8:06 AM

16,125 Comments

@All,

I've also been told that the most recent version of MySQL - 5.7 - apparently supports online ALTER TABLE operations, though I have not tried this personally. Looking at the docs, it looks like you can specify the algorithm that it is supposed to use:

>> Specifying ALGORITHM=INPLACE makes the operation use the
>> in-place technique for clauses and storage engines that support it,
>> and fail with an error otherwise, thus avoiding a lengthy table copy
>> if you try altering a table that uses a different storage engine than
>> you expect.

So, that sounds pretty cool!

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.