Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Mark Drew and Reto Aeberli
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Mark Drew@markdrew ) and Reto Aeberli@aeberli )

Performing Online ALTER TABLE SQL Migrations Without Any Downtime

By Ben Nadel on
Tags: ColdFusion, SQL

The other day, I tweeted about how much I love the INSERT IGNORE INTO syntax in MySQL. In response to that tweet, Ken Auenson asked me what kind of a use-case I had for ignoring duplicate records. Typically, I use IGNORE when migrating and / or transforming data. And, since I don't think I've ever really blogged about that before, I thought I might take a quick look at one use-case that has been very helpful to me: performing online ALTER TABLE statements without any downtime.


 
 
 

 
INSERT IGNORE INTO MySQL syntax is great for data transformations and migrations. 
 
 
 

The IGNORE portion of the INSERT statement just says that if the INSERT causes a primary key violation, ignore the error (without inserting the record) and move on with the SQL statement. This is really handy in a migration or a transformation process where you might have to run the same script over and over again and don't want to worry about existing records.

When it comes to [how I execute] online ALTER TABLE statements without any downtime, the IGNORE comes into play because I set up TRIGGERs that automatically replicate rows across two tables. Then, in conjunction with said triggers, I start explicitly copying rows from one table to another. Due to the nature of the triggers and the timing and the possibility of failure in the script itself, there's a chance that I'll try to explicitly copy a record that has already been replicated. As such, I just tell MySQL to ignore the primary key violation and continue on with the migration.

Side Note: Outside of transformation and migration scripts, I don't really have a use-case for the IGNORE syntax. In the workflows of my applications, I typically want SQL to throw an error upon violations because it means that something else is going wrong in my business logic.

When performing the online migration, I have a four-step workflow:

  1. Setup shadow table and triggers.
  2. Migrate existing records.
  3. Run some sanity checks.
  4. Swap the live table and the shadow table.

NOTE: There are some tools that will do this for you from the command-line, like pt-online-schema-change from Percona. However, many of those require MySQL to be running on a *nix platform and I started doing this when I was still running everything on Windows.

Step 1: Setup Shadow Table And Triggers.

When the production table is very large, we cannot simply run an ALTER TABLE statement on it directly. It will lock up the table and, depending on what table it is, may bring your application to a grinding halt. As such, we have to build the desired table structure in parallel with the live table. Then, when two tables are in sync, swap them. The first step is setting up the new structure and configuring triggers to automatically pull new data into the shadow table.

  • /*
  • As a first step, I get the existing structure of the table that I want to
  • migrate. This is often just the `SHOW CREATE TABLE friend` executed on the
  • production database. Notice, however, that the table is suffixed with "shadow".
  • */
  • CREATE TABLE `friend_shadow` (
  • `id` int(1) unsigned NOT NULL AUTO_INCREMENT,
  • `name` varchar(30) NOT NULL,
  • PRIMARY KEY (`id`)
  • ) ENGINE=InnoDB DEFAULT CHARSET=UTF8;
  •  
  • /*
  • Next, I create the ALTER statement that I would like to be running on the
  • production data table.
  • --
  • NOTE: While this could be rolled into the `CREATE TABLE` statement above, I
  • like seeing it broken out into the unique set of changes that are going to
  • be applied to the table. I feel like this is a great sanity check for both
  • the online migration as well as a good basis for any `ALTER` statement that
  • developers can run locally where table-locking is not an issue.
  • */
  • ALTER TABLE `friend_shadow`
  • ADD COLUMN `isBFF` bit(1) NOT NULL AFTER `name`,
  • ADD COLUMN `updatedAt` datetime NOT NULL AFTER `isBFF`,
  • ADD INDEX `IX_bestFriend` ( `isBFF` )
  • ;
  •  
  • /*
  • Now that we have our shadow table created, we want to set up triggers on
  • the target table that we want to migrate. These triggers will make sure
  • that any modifications made to the target table will be automatically
  • mirrored in the shadow table while we are doing the migration.
  • --
  • CAUTION: I highly recommend that you remove any existing triggers on the
  • target table. Ultimately, we're going to perform some table renames and
  • if you have existing triggers, things will get funky like a chicken!
  • */
  •  
  • /*
  • Since triggers require embedded semi-colons, let's change the active
  • delimiter so that the statement parser doesn't get confused by the embedded
  • semi-colons.
  • */
  • DELIMITER END_COMPOUND_STATEMENT
  •  
  • /*
  • Setup the INSERT trigger. When new records are inserted into our target
  • table, we want to automatically insert those records into our shadow table
  • with the new columns set to appropriate default values.
  • */
  • CREATE TRIGGER `insert_friend_shadow` AFTER INSERT ON `friend`
  • FOR EACH ROW
  • BEGIN
  •  
  • INSERT INTO `friend_shadow`
  • (
  • `id`,
  • `name`,
  •  
  • /* Our new columns. */
  • `isBFF`,
  • `updatedAt`
  • ) VALUES (
  • NEW.id,
  • NEW.name,
  •  
  • /* isBFF. */
  • 0,
  •  
  • /* updateAt */
  • UTC_TIMESTAMP()
  • );
  •  
  • END
  • END_COMPOUND_STATEMENT /* Our temporary delimiter. */
  •  
  •  
  • /*
  • Setup the UPDATE trigger. When existing records are updated in our target
  • table, we want to automatically pull those updates into the shadow table.
  • Don't worry if the rows don't exist yet - if not, they will be taken care
  • of during the row-based migration.
  • */
  • CREATE TRIGGER `update_friend_shadow` AFTER UPDATE ON `friend`
  • FOR EACH ROW
  • BEGIN
  •  
  • UPDATE
  • `friend_shadow`
  • SET
  • `name` = NEW.name,
  •  
  • /* Our new columns (that matter for update). */
  • `updatedAt` = UTC_TIMESTAMP()
  • WHERE
  • `id` = OLD.id
  • ;
  •  
  • END
  • END_COMPOUND_STATEMENT /* Our temporary delimiter. */
  •  
  •  
  • /*
  • Setup the DELETE trigger. When existing records are deleted from our target
  • table, we want to automatically remove those records from our shadow table.
  • */
  • CREATE TRIGGER `delete_friend_shadow` AFTER DELETE ON `friend`
  • FOR EACH ROW
  • BEGIN
  •  
  • DELETE FROM
  • `friend_shadow`
  • WHERE
  • id = OLD.id
  • ;
  •  
  • END
  • END_COMPOUND_STATEMENT /* Our temporary delimiter. */
  •  
  • /* Restore the normal delimiter. */
  • DELIMITER ;
  • /*
  • NOTE: The DELIMITER statement seems to fail if there is not some sort of
  • whitespace after it, hence this comment.
  • */

Step 2: Migrate Existing Records.

Once we have our shadow table setup and we have triggers moving over new modifications, we have to take care of migrating existing records. This is where the INSERT IGNORE INTO comes into play. When migrating existing records from the live table to the shadow table, I don't want to worry about primary key violations. This allows me to stop the script, tweak it, and restart it without having to reset the state of the system.

I'm running my migrations in ColdFusion, but there is nothing ColdFusion specific about this concept.

  • <!---
  • Now that we have shadow table in place and our target table triggers
  • implicitly copying new records, we need to start migrating existing
  • rows into the shadow table. Due to the size of the target table, we can't
  • just INSERT INTO ... SELECT the shadow table - it will lock up the target
  • table. As such, we have to run the migration in reasonable blocks.
  • --->
  •  
  • <!--- Define the basis of the next block of records. --->
  • <cfparam name="url.nextBlockID" type="numeric" default="0" />
  •  
  • <!---
  • This is the ID of the last record in the target table at the time the
  • triggers were created. Since triggers will take care of all new rows, we
  • know that we only have to migrate up to this record.
  • --->
  • <cfset maxBlockID = 123456789 />
  •  
  • <!---
  • Determine how many records we can migrate in a single statement.
  • --
  • NOTE: Typically, I'll start with something reasonable and then live-tweak
  • this value as the migration is running. At the same time, I'll watch the
  • response times in something like Fusion Reactor to make sure that the table
  • isn't locking up and that overall database performance doesn't seem to be
  • negatively affected.
  • --->
  • <cfset blockSize = 1 />
  •  
  •  
  • <!--- Check to see if we are done migrating. --->
  • <cfif ( url.nextBlockID gt maxBlockID )>
  •  
  • <strong>Done!</strong> Please proceed to the sanity check phase.
  • <cfabort />
  •  
  • <!--- We are still migrating - report progress. --->
  • <cfelse>
  •  
  • <cfoutput>
  • #numberFormat( url.nextBlockID, "," )# of #numberFormat( maxBlockID, "," )#
  • &mdash;
  • #numberFormat( ( url.nextBlockID / maxBlockID * 100 ), ",.00" )#%
  • </cfoutput>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Migrate the next block of records.
  • --
  • NOTE: I am using the `IGNORE` aspect of the `INSERT` for two reasons:
  • First, this will avoid any problems with migrating records that may have
  • been automatically migrated by the triggers before this row-based process
  • was initiated. And, second, if this script were to fail for some reason,
  • I can safely restart it again without having to worry about migrating
  • duplicate records.
  • --->
  • <cfquery name="migrateBlockToShadowTable" datasource="testing">
  • INSERT IGNORE INTO `friend_shadow`
  • (
  • id,
  • name,
  •  
  • /* New columns. */
  • isBFF,
  • updatedAt
  • ) (
  •  
  • SELECT
  • f.id,
  • f.name,
  •  
  • /* isBFF. */
  • 0,
  •  
  • /* updatedAt. */
  • UTC_TIMESTAMP()
  • FROM
  • `friend` f
  •  
  • /*
  • We are going to be using the ID (primary key) of the table to quickly
  • look up all records within the block. This block may not have an equal
  • number of records (due to record deletion). But, it usually runs fast
  • enough to not worry about the inefficiencies.
  • */
  • WHERE
  • f.id >= <cfqueryparam value="#url.nextBlockID#" cfsqltype="cf_sql_integer" />
  • AND
  • f.id < <cfqueryparam value="#( url.nextBlockID + blockSize )#" cfsqltype="cf_sql_integer" />
  •  
  • );
  • </cfquery>
  •  
  •  
  • <!--- Move forward by a block-factor. --->
  • <cfset url.nextBlockID += blockSize />
  •  
  • <!---
  • Recursively move onto the next block. I use a small delay here to make sure
  • that I don't hammer the server or confuse the browser into thinking it's in
  • the middle of a "too many redirects" loop.
  • --
  • NOTE: I usually start off with a delay of a few seconds so I can monitor the
  • server to make sure nothing is adversely affected. Then, once everything seems
  • to be fine, I usually live-tweak the delay, dropping it down to 50-milliseconds.
  • --->
  • <cfoutput>
  • <script type="text/javascript">
  •  
  • setTimeout(
  • function() {
  •  
  • location.href = "#cgi.script_name#?nextBlockID=#url.nextBlockID#&cacheBust=#getTickCount()#";
  •  
  • },
  • 5000
  • );
  •  
  • </script>
  • </cfoutput>

Step 3: Run Some Sanity Checks.

Technically, this isn't necessary. If the script is good and the migration is true and the developer is brave, then nothing should have gone wrong. But, migrating tables is perhaps the most terrifying aspect of database administration. So, in order to not vomit on myself, I like to run a few sanity checks. The following sanity checks just help me believe that the shadow table accurately represents the live table plus the altered structure.

  • /*
  • After the row-based migration is complete, I usually like to run a fw sanity
  • checks on the shadow table. Really, this isn't necessary; but, this is how I
  • sleep well at night.
  • */
  •  
  • /*
  • The first sanity check is simply making sure that both tables have the same
  • number of rows and that their MIN and MAX id values are the same.
  • */
  • SELECT
  • /* Row count (the following two columns should match). */
  • (
  •  
  • SELECT
  • COUNT( * )
  • FROM
  • `friend`
  •  
  • ) AS oldCount,
  • (
  •  
  • SELECT
  • COUNT( * )
  • FROM
  • `friend_shadow`
  •  
  • ) AS newCount,
  •  
  •  
  • /* Min ID value (the following two columns should match). */
  • (
  •  
  • SELECT
  • MIN( id )
  • FROM
  • `friend`
  •  
  • ) AS oldMinID,
  • (
  • SELECT
  • MIN( id )
  • FROM
  • `friend_shadow`
  •  
  • ) AS newMinID,
  •  
  •  
  • /* Max ID value (the following two columns should match). */
  • (
  •  
  • SELECT
  • MAX( id )
  • FROM
  • `friend`
  •  
  • ) AS oldMaxID,
  • (
  • SELECT
  • MAX( id )
  • FROM
  • `friend_shadow`
  •  
  • ) AS newMaxID
  • ;
  •  
  •  
  • /*
  • The second sanity check is to get a block or records from each table and
  • make sure they are exactly the same, value-wise. Since the table is too big
  • to compare in a single statement, I'll usually just spot-check records based
  • on a selection of IDs.
  • */
  • SELECT
  • ot.id,
  • nt.id
  • FROM
  • /* Limit the comparison to the following ID selection. */
  • (
  •  
  • SELECT
  • f.id
  • FROM
  • `friend` f
  • WHERE
  • f.id > 0 /* Where to start looking. */
  • ORDER BY
  • f.id ASC
  • LIMIT
  • 10 /* How many rows to compare in one shot. */
  • OFFSET
  • 0 /* Which block to start at. */
  •  
  • ) AS idList
  • INNER JOIN
  • `friend` ot /* ot: Old Table. */
  • ON
  • ot.id = idList.id
  • LEFT OUTER JOIN
  • `friend_shadow` nt /* nt: New Table. */
  • ON
  • (
  • nt.id = ot.id
  • AND
  • nt.name = ot.name
  •  
  • /*
  • NOTE: We are not including our new columns in this comparison since
  • they are not relevant to the old table.
  • */
  • )
  •  
  • /*
  • Only return records where the old table has a record that is NOT reflected
  • in the new table - ie, where the new table ID is null.
  • */
  • WHERE
  • nt.id IS NULL
  • ;

Step 4: Swap The Live Table And The Shadow Table.

Once I am convinced that the shadow table is ready to go, it's time to swap the two tables and drop the triggers.

  • /*
  • Once the row-based migration has been completed, we know that our shadow table
  • has the correct structure and all the live, up-to-date data. At this point, we
  • can swap the two tables and drop all the triggers.
  • --
  • CAUTION: I am not wrapping this in a transaction (BEGIN/COMMIT) since table and
  • trigger alterations implicitly commit any existing transaction. That said, I
  • have never run into a problem with this action.
  • Documentation: https://dev.mysql.com/doc/refman/5.7/en/implicit-commit.html
  • */
  • RENAME TABLE
  • `friend` TO `__friend_pre_migration__`,
  • `friend_shadow` TO `friend`
  • ;
  •  
  • DROP TRIGGER IF EXISTS `insert_friend_shadow`;
  • DROP TRIGGER IF EXISTS `update_friend_shadow`;
  • DROP TRIGGER IF EXISTS `delete_friend_shadow`;

At this point, your live table now has the migrated table structure, all with no downtime.

I am not a database administrator, so take this with a grain of salt; I learn about databases as I go, when needed. Mostly, I just wanted to show a solid use-case for the INSERT IGNORE INTO syntax. While not very useful [for me personally] in general application development, INSERT IGNORE INTO has been absolutely priceless when it comes to data transformation scripts.



Reader Comments

Ben,
I am always blown away with you attention to detail, and your willingness to spend the time and energy to make posts like this.
You rock!
I have never had to do a live update like this, very neat approach. Thanks again.

Reply to this Comment

@Ken,

Thanks my man. Doing updates like this NOT fun :) I have a mini panic-attack right before I have to swap out the tables. It's never actually caused a problem... but, it's utterly terrifying every time :(

Reply to this Comment

@Ben,

You should be able to do the equivalent in Oracle and SQL Server using the MERGE syntax.

This isn't a completely foolproof online solution though. You do that the small moments in time when your swapping the tables to contend with. Also, don't forget about any key relationships that need moved.

Reply to this Comment

@All,

I've also been told that the most recent version of MySQL - 5.7 - apparently supports online ALTER TABLE operations, though I have not tried this personally. Looking at the docs, it looks like you can specify the algorithm that it is supposed to use:

>> Specifying ALGORITHM=INPLACE makes the operation use the
>> in-place technique for clauses and storage engines that support it,
>> and fail with an error otherwise, thus avoiding a lengthy table copy
>> if you try altering a table that uses a different storage engine than
>> you expect.

So, that sounds pretty cool!

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.