Considering Approaches To Handling MySQL Key Conflicts In Lucee CFML

Published 2022-05-03 in ColdFusion, SQL — Comments (12)

Relational databases are magical. And, as I've become more experienced as a software engineer, I've begun to lean more heavily on the database as a means to enforce data integrity (with UNIQUE KEY constraints) and to drive idempotent workflows. That said, I'm still trying to figure out where in the software stack it makes sense to put all the finer details. As such, I wanted to take a moment and consider my options for handling key conflicts in MySQL and Lucee CFML.

Database index design is critical to application success. Period. Not only do the proper indexes lead to high performance application, they also create peace-of-mind over data integrity. Through smart index design, you can ensure that storing duplicate information in the database is significantly less likely. Which, in turn, means that your application's inevitable race-conditions can be counteracted at the data persistence level.

Long-story short, I love databases!

And, when I'm using a database in an idempotent workflow, I often want to explicitly handle unique key constraint violations. In some workflows, that means quietly skipping over a key constraint violation. In other workflows, that means returning the id of the conflicting row. And, in other workflows, that means executing an alternate branch within the application control-flow.

I don't think there's a "one-size fits all" solution to the construction of my database access layer (DAL) that facilitates all of the use cases all of the time. I think certain methods within my DAL API have to be constructed to enable specific behaviors.

To explore this, consider a simple database table, thing, that has a primary key column and secondary column with a unique key constraint:

CREATE TABLE `thing` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`value` varchar(255) NOT NULL,
	PRIMARY KEY (`id`),
	UNIQUE KEY `IX_byValue` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If I were to try and insert a row that conflicted with either the id column or the value column of an existing record, MySQL throws the following error:

Error number: 1062; Symbol: ER_DUP_ENTRY; SQLSTATE: 23000

Such would be the case with this implementation in my database access layer:

component {

	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, the MySQL database and driver will
	* throw an error with (NativeErrorCode: 1062).
	*/
	public numeric function createThing( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			;
		</cfquery>
		```

		return( metaResults.generatedKey );

	}

}

Now, I don't want to catch this kind of conflict error in my application's service layer. Because, if my service layer needs to handle "SQL specific" errors, it means that the database semantics are leaking out into the rest of the application. Which, is fine if the errors are "unhandled"; but, if I want to handle the errors, I need an error that is application-specific, not technology-specific.

To do this, I could wrap my INSERT INTO SQL statement in a try/catch and then inspect the error object for NativeErrorCode: 1062. But, this feels a little messy. I think a cleaner solution would be to use an INSERT IGNORE INTO SQL statement and then check the query metadata to see if any rows were affected (zero rows would mean the insert was ignored):

component {

	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, the "INSERT IGNORE" will quietly skip
	* the record creation. In that case, ZERO ROWS will be affected, which we can then use
	* to generate a domain-specific error (one that we can predictably catch in the layer
	* above the data-access layer (DAL)).
	*/
	public numeric function createThing( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT IGNORE INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			;
		</cfquery>
		```

		// If the INGORE caused no records to be created, throw an APP-SPECIFIC ERROR.
		// This helps prevent "sql semantics" from leaking up in to the service layer.
		if ( ! metaResults.recordCount ) {

			throw(
				type = "Gateway.Conflict",
				message = "Values conflict with existing data.",
				extendedInfo = serializeJson( arguments )
			);

		}

		return( metaResults.generatedKey );

	}

}

This time, a key conflict violation would mean that no new row was created in the database. We can easily check for that in the result of the CFQuery tag. And, turn around and throw an application specific error, Gateway.Conflict in response. My service layer can then handle that error specifically in a catch statement:

} catch ( Gateway.Conflict error ) { ... }

... without the service layer having to know that I'm using a database under the hood in my gateway layer.

But, maybe I don't want to throw an error at all. Maybe, if a row already exists in the database, I just want to return the primary key of the existing row so that I can use that key in the rest of my idempotant workflow.

To do that, we can use the ON DUPLICATE KEY UPDATE semantics to both ignore the new row creation and return the primary key of the existing row:

component {

	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, we're going to use a NO-OP "ON
	* DUPLICATE UPDATE" in order to access the PRIMARY KEY of the conflicting record. By
	* calling "LAST_INSERT_ID(id)" with a value, it will allow the ".generatedKey"
	* property within the query meta-results to contain the key of the EXISTING record.
	*/
	public numeric function createThing( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			ON DUPLICATE KEY UPDATE
				id = LAST_INSERT_ID( id ) -- Store existing ID into "generatedKey".
			;
		</cfquery>
		```

		return( metaResults.generatedKey );

	}

}

Normally, the ON DUPLICATE KEY UPDATE semantics would allow us to update the row if it already existed. However, in this case, I don't want to update it - I just want to use the no-op assignment of the id column. And, more specifically, I want to assign the id column using the LAST_INSERT_ID() function.

As I demonstrated a few months ago, when the LAST_INSERT_ID() function is invoked with an argument, it does two things:

It echoes the function argument as the result of the function invocation.
It stores the argument in the value reported as the last inserted key in the database (in this particular request).

So, when I execute this line of SQL:

id = LAST_INSERT_ID( id )

... it means that the id (primary key) of the existing row is going to be reported as the most recently-generated key in the request. Which is what is then reported within the CFQuery tag's .generatedKey property. This allows me to then return the id of the existing row up to my ColdFusion service layer, where it can carry-on as if the INSERT INTO executed successfully.

Again, I don't think there's any one approach here that I would use all the time. I think each of these approaches makes sense for a specific type of workflow. Which is why I strongly believe that your data access layer can have as many methods as you find helpful. The trick will be coming up with method names that clearly articulate the "contract" of the data access method.

For completeness, here's the ColdFusion component that I was testing with as I explored my key conflict options:

component
	output = false
	hint = "I provide data access methods for Things."
	{

	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, the MySQL database and driver will
	* throw an error with (NativeErrorCode: 1062).
	*/
	public numeric function createThing( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			;
		</cfquery>
		```

		return( metaResults.generatedKey );

	}


	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, the "INSERT IGNORE" will quietly skip
	* the record creation. In that case, ZERO ROWS will be affected, which we can then use
	* to generate a domain-specific error (one that we can predictably catch in the layer
	* above the data-access layer (DAL)).
	*/
	public numeric function createThing2( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT IGNORE INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			;
		</cfquery>
		```

		// If the INGORE caused no records to be created, throw an APP-SPECIFIC ERROR.
		// This helps prevent "sql semantics" from leaking up in to the service layer.
		if ( ! metaResults.recordCount ) {

			throw(
				type = "Gateway.Conflict",
				message = "Values conflict with existing data.",
				extendedInfo = serializeJson( arguments )
			);

		}

		return( metaResults.generatedKey );

	}


	/**
	* I create a new thing with the given value.
	* 
	* EXPLANATION: In this version, we're going to attempt to insert a new row. And, if
	* there is a key-conflict caused by the insert, we're going to use a NO-OP "ON
	* DUPLICATE UPDATE" in order to access the PRIMARY KEY of the conflicting record. By
	* calling "LAST_INSERT_ID(id)" with a value, it will allow the ".generatedKey"
	* property within the query meta-results to contain the key of the EXISTING record.
	*/
	public numeric function createThing3( required string value ) {

		```
		<cfquery name="local.results" result="local.metaResults">
			INSERT INTO
				thing
			SET
				value = <cfqueryparam value="#value#" sqltype="varchar" />
			ON DUPLICATE KEY UPDATE
				id = LAST_INSERT_ID( id ) -- Store existing ID into "generatedKey".
			;
		</cfquery>
		```

		return( metaResults.generatedKey );

	}

}

Did I mention that I love databases?!

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4259

Reader Comments

Chris G May 4, 2022 at 12:30 PM

230 Comments

There's some cleaver DB wizardry here I had not seen before. 👀🤓🧐😍

Ben Nadel May 4, 2022 at 12:33 PM

15,822 Comments

@Chris,

Heck yeah! SQL is just a groovy language. Now, these are all specifically for the MySQL specification; but, I'm 100% certain that all SQL variations have some form of these constructs.

Chris G May 4, 2022 at 1:12 PM

230 Comments

For sure! I work a lot in oracle, which can feel unnecessarily esoteric in syntax...but I've come to appreciate (and understand some of) it's brilliance. There's usually a method behind the madness...I just don't always comprehend it

Ben Nadel May 4, 2022 at 1:39 PM

15,822 Comments

@Chris,

"Oracle" as a database always feels like one of those "enterprise only" things. To be clear, I know nothing about it; but, I've only ever heard it discussed in the context of massive companies.

Chris G May 4, 2022 at 2:25 PM

230 Comments

Definitely! It has features only massive companies would only care about 🤣

Ben Nadel May 4, 2022 at 2:26 PM

15,822 Comments

Ha ha

Anthony May 11, 2022 at 4:39 PM

4 Comments

I've never liked sql,but I know I should learn it more. This was great insight.

Ben Nadel May 11, 2022 at 4:39 PM

15,822 Comments

@Anthony,

That's how I feel about the MongoDB query API... I do not like it, I do not like it one bit :D But, I really should get better at it.

Anthony May 11, 2022 at 5:17 PM

4 Comments

We just gotta bite the bullet sometimes

Ben Nadel Sep 22, 2022 at 10:01 PM

15,822 Comments

@All,

One thing I just ran into with the approach that uses the ON DUPLICATE SET to get the existing ID - this appears to increment the row ID even if no new row is created. Meaning, imagine that the last row inserted has id:5. Then, I try to run the same query over-and-over 15-times, all returning the same, existing ID. When the next new row is inserted, it will be with id:20. This is definitely not what I would want. I wonder if it is just relating to the current session; or, if this would be across all sessions connecting to the database.

Ben Nadel Sep 23, 2022 at 10:01 AM

15,822 Comments

So, an update on my previous statement - it looks like the AUTO_INCREMENT column is incremented as a byproduct of the ON DUPLICATE KEY UPDATE internal logic, not as a byproduct of the LAST_INSERT_ID() call. Just experimenting with it locally.

It looks like this is documented in the MySQL docs, but very "quietly". This is the only mention I can find:

For an InnoDB table where a is an auto-increment column, the INSERT statement increases the auto-increment value but the UPDATE does not.

I'm writing up a demo and will share it here.

Ben Nadel Sep 23, 2022 at 11:59 AM

15,822 Comments

@All,

Here's my follow-up post on the AUTO_INCREMENT behavior during a key-conflict scenario:

www.bennadel.com/blog/4326-key-conflicts-on-insert-still-increment-auto-increment-value-in-mysql.htm

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!