The other day, while writing some "archiving" logic in a data-driven ColdFusion application, I started getting errors that I didn't quite understand. As I was moving records from an "active" table to an "archive" table, the database started throwing "Duplicate entry for key 'PRIMARY'" constraint violations. At first, I was completely baffled by this error since I was moving the PRIMARY KEY as-is from one table to another. Eventually, I noticed that this only happened after I restarted the MySQL service. Which is how I learned that the AUTO_INCREMENT counters in MySQL 5.7 (and earlier) are stored in-memory; and, that said in-memory counters are reset whenever the MySQL service is restarted.
In MySQL 5.7 and earlier, when you restart the MySQL service, the AUTO_INCREMENT values on each table get re-initialized to be something akin to:
MAX( id ) + 1
This is because, in MySQL 5.7 and earlier, the AUTO_INCREMENT counters are just stored in memory; so, when the MySQL service is restarted, there is no record of what the last counter value was. As such, when the service comes back on-line, it has to calculate the new counter values based on the data contained within each table.
This was causing an issue for me during the "archiving" process because it allows for edge-cases in which PRIMARY KEY values can be re-used on the "active" table. This was, in turn, causing a key-collision when I moved the record to the "archive" table, which already had the re-used PRIMARY KEYs in it.
This is an easy problem to reproduce once you understand the necessary conditions. To see this in action, I've created a simple gateway ColdFusion component that provides access to an active "user" table and its counterpart, the "user_archive" table:
<cfcomponent output="false" hint="I provide methods for manipulating user records."> <cffunction name="archiveUserByID" access="public" returntype="void" output="false"> <!--- Define arguments. ---> <cfargument name="userID" type="numeric" required="true" /> <cftransaction> <cfquery name="local.results"> /* DEBUG: userGateway.archiveUserByID(). */ INSERT INTO user_archive ( id, firstName, lastName, phone, state, zip ) ( SELECT u.id, u.firstName, u.lastName, u.phone, u.state, u.zip FROM user u WHERE u.id = <cfqueryparam value="#userID#" cfsqltype="cf_sql_integer" /> ); /* DEBUG: userGateway.archiveUserByID(). */ DELETE FROM user WHERE id = <cfqueryparam value="#userID#" cfsqltype="cf_sql_integer" /> ; </cfquery> </cftransaction> </cffunction> <cffunction name="createUser" access="public" returntype="numeric" output="false"> <!--- Define arguments. ---> <cfargument name="firstName" type="string" required="true" /> <cfargument name="lastName" type="string" required="true" /> <cfargument name="phone" type="string" required="true" /> <cfargument name="state" type="string" required="true" /> <cfargument name="zip" type="string" required="true" /> <cfquery name="local.results"> /* DEBUG: userGateway.createUser(). */ INSERT INTO user SET firstName = <cfqueryparam value="#firstName#" cfsqltype="cf_sql_varchar" />, lastName = <cfqueryparam value="#lastName#" cfsqltype="cf_sql_varchar" />, phone = <cfqueryparam value="#phone#" cfsqltype="cf_sql_varchar" />, state = <cfqueryparam value="#state#" cfsqltype="cf_sql_varchar" />, zip = <cfqueryparam value="#zip#" cfsqltype="cf_sql_varchar" /> ; SELECT ( @@Identity ) AS id ; </cfquery> <cfreturn results.id /> </cffunction> </cfcomponent>
As you can see, the archiveUserByID() method moves a record, as-is - ID included - from the "user" table the to "user_archive" table. And, the createUser() method simply inserts a new record into the "user" table. In this case, I'm using MySQL's AUOT_INCREMENT feature on the "user" table, which is why I don't have to provide an "id" when creating new records.
Now, let's create a simple script that creates a new user record and then immediately archives it:
<cfscript> userGateway = new UserGateway(); userID = userGateway.createUser( firstName = "Sarah", lastName = "Smith", phone = "", state = "NY", zip = "10011" ); // Now, let's archive the user we just created. userGateway.archiveUserByID( userID ); </cfscript>
This is going to add a record to the end of the "user" table. Then, immediately turn around and move that same record out of the "user" table and into the "user_archive" table.
Now, if we run this ColdFusion application and refresh that page a bunch of times, everything works fine; new records get created and then archived. However, if we restart the ColdFusion application and then try to refresh the page, we get the following error:
Error Executing Database Query: Duplicate entry '1' for key 'PRIMARY'
The problem is that after the MySQL service restart, the AUTO_INCREMENT value on the "user" table was reinitialized to "1". However, there was already a record with PRIMARY KEY "1" in the "archive" table. So, when we created a new "user" (with auto-generated ID = "1"), and then tried to archive it, we ran into a KEY collision.
Now, granted, this only occurs if you are operating on the end of the table. Which, for something like an "archiving" task, is unlikely to happen. As such, I kind of just went with "hope as a strategy" on this problem. And, to be honest, I can't really think of a better solution. If I switched over to using UUIDs (Universally Unique Identifier) instead of AUTO_INCREMENT integers for the PRIMARY KEY value, that would work; but, in the application that I'm maintaining, that ship sailed a long time ago.
According to the MySQL documentation, this behavior has changed in MySQL 5.8. Apparently, the AUTO_INCREMENT counter values will get written to a redo log such that they will persist across service restarts. However, even with the 5.8 changes, MySQL caveats that there are still edge-cases that cannot be guaranteed:
However, in the case of a server crash, reuse of a previously allocated auto-increment value cannot be guaranteed. Each time the current maximum auto-increment value is changed due to an INSERT or UPDATE operation, the new value is written to the redo log, but if the crash occurs before the redo log is flushed to disk, the previously allocated value could be reused when the auto-increment counter is initialized after the server is restarted.
This post was mostly a "note to self". This seems like a really fundamental aspect of the MySQL AUTO_INCREMENT counter that has escaped my mental model for years. As such, I'm happy to finally understand how it works. I don't think it will necessarily change the way I am architecting my code; but, it will certainly give me another perspective from which to view my data algorithms.