Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Trying To Debug "Deadlock found when trying to get lock; try restarting transaction" Errors In Lucee CFML 5.2.9.40

By Ben Nadel on
Tags: ColdFusion, SQL

The other day, Josh Siok and I were running into a strange problem: we were executing a ColdFusion page that was immediately terminating in a Transaction Deadlock error in MySQL. However, when we looked at the SQL statement that was failing to obtain the lock, it was a complete mystery (at least to us) as to why that SQL statement was consistently running into the locking problem. Eventually - after may page refreshes - the page finally executed successfully. However, I still have no idea what that was happening; so, I wanted to see if I could find a way to get better logging around MySQL deadlocks in our Lucee CFML 5.2.9.40 application.

When we were executing our ColdFusion code, this is the MySQL error was showing up in our logs:

Deadlock found when trying to get lock; try restarting transaction.

To be honest, as much as I love SQL databases, I have a rather poor understanding of how Transactions work; and, about what conditions actually lead to deadlocks. I vaguely understand that a deadlock revolves around concurrent access to a single record; but, my understanding goes no deeper than that, really.

When I started to read-up on how I might get some better insight, I was excited to see that the MySQL performance schema actually provides insight into Transaction and Locking information. Their docs even provide a SQL statement that shows which transactions are blocking and which are waiting:

SELECT
	( r.trx_mysql_thread_id ) AS waiting_thread,
	( r.trx_query ) AS waiting_query,
	( b.trx_mysql_thread_id ) AS blocking_thread,
	( b.trx_query ) AS blocking_query
FROM
	information_schema.innodb_lock_waits w
LEFT OUTER JOIN
	information_schema.innodb_trx b
ON
	b.trx_id = w.blocking_trx_id
LEFT OUTER JOIN
	information_schema.innodb_trx r
ON
	r.trx_id = w.requesting_trx_id
;

At first, I thought I might be able to run this query when a Transaction deadlock appears in the ColdFusion application. However, what I discovered in my testing is that, by the time we see the Deadlock found when trying to get lock error, the lock information about the given query is already gone from the above result-set.

It seems like the above query might be helpful if I was continually polling the database in order to gather insights. But, that is not a viable solution for our particular application context. If a deadlock caused a thread to hang, then maybe I could see polling working out. However, the default setting in MySQL is set to immediately fail upon deadlock detection. As such, the above query would only contain the relevant information for a split-second before the waiting transaction was rolled-back. The chances that a polling-solution would see that information is very low.

Now, in addition to the Transaction locking information, MySQL's information_schema also provide access to the processlist, which contains the state of all the running threads in the database. I'm a huge fan of the processlist - it's the reason I add a DEBUG statement to every SQL query that I write. So, if I can't explicitly identify which Transaction lock caused a given query to fail, my next thought was that I might use the processlist to at least see what SQL statements were executing at the time of the error.

To test this, I set up a Lucee CFML page that spawns three asynchronous CFThread tags that each attempt to update the same row in the database. And, in order to increase the chances of lock-contention, I'm including a short SLEEP() command in my UPDATE statement.

Each of the database calls is wrapped in a try/catch block in which the catch stanza appends the current processlist to the THREAD pseudo-scope. At the end of the CFML page, I then wait for each CFThread re-join the parent process before dumping out the thread state:

<cfscript>

	/**
	* I try to create some lock-contention around a single record in the database by
	* including a SLEEP() command as part of the UPATE SQL statement.
	* 
	* @threadName I am the thread-name that is executing the query.
	*/
	public void function incrementUpdatedAt( required string threadName ) {

		transaction isolation = "serializable" {

			var currentDate = queryExecute(
				sql = "
					/* DEBUG: #threadName#. */
					SELECT
						u.updatedAt
					FROM
						user u
					WHERE
						u.id = 1
					;
				"
			);

			var newDate = currentDate.updatedAt.add( "s", 5 );

			queryExecute(
				sql = "
					/* DEBUG: #threadName#. */
					UPDATE
						user u
					SET
						u.updatedAt = :newDate
					WHERE
						u.id = 1
					AND
						/* CAUTION: Trying to simulate contention around record. */
						SLEEP( #randRange( 1, 2 )# ) = 0
					;
				",
				params = {
					newDate: {
						value: newDate,
						sqlType: "timestamp"
					}
				}
			);

		} // END: Transaction.

	}

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// Now that we have a function that will try to create slow-access around a single
	// record, let's attempt to spawn several Threads that will access this record at the
	// same time, hopefully creating a deadlock.
	// --
	// CAUTION: It appears that our MySQL database is configured to immediately reject a
	// deadlock condition, which is the default behavior (rather than waiting the default
	// 50-seconds for the lock to resolve itself).

	// In each of the following threads, we're going to CATCH any errors (deadlock) and
	// append some introspective data to the THREAD scope so we can dump-it-out after.

	thread name = "t1" {

		try {
			incrementUpdatedAt( "t1" );
		} catch ( any error ) {
			thread.queryMetaData = getQueryMetadata( error );
		}

	}

	thread name = "t2" {

		try {
			incrementUpdatedAt( "t2" );
		} catch ( any error ) {
			thread.queryMetaData = getQueryMetadata( error );
		}

	}

	thread name = "t3" {

		try {
			incrementUpdatedAt( "t3" );
		} catch ( any error ) {
			thread.queryMetaData = getQueryMetadata( error );
		}

	}

	thread action = "join";
	// Now that all threads have finished executing, let's look at the results.
	dumpThread( "Thread 1", cfthread.t1 );
	dumpThread( "Thread 2", cfthread.t2 );
	dumpThread( "Thread 3", cfthread.t3 );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I return the current query meta-data surrounding the given error. This includes
	* any SQL that was executing as the source of the error AND any SQL statements that
	* are currently executing in the database.
	* 
	* @error I am the error being examined.
	*/
	public struct function getQueryMetadata( required struct error ) {

		return({
			errorType: error.type,
			errorDetail: error.detail,
			sql: ( error.sql ?: "None" ),
			processList: getProcessList()
		});

	}


	/**
	* I return all of the running query statements using the PROCESSLIST table.
	*/
	public array function getProcessList() {

		var processList = queryExecute(
			sql = "
				SELECT
					pl.info
				FROM
					information_schema.processlist pl
				WHERE
					pl.info NOT LIKE '%information_schema%' -- Don't select self.
				;
			",
			options = {
				returnType: "array"
			}
		);

		var statements = processList.map(
			( record ) => {

				return( record.info );

			}
		);

		return( statements );

	}


	/**
	* I output the given Thread scope, showing the keys that are relevant to this demo.
	* 
	* @label I am the CFDump label argument.
	* @data I am the CFDump var argument.
	*/
	public void function dumpThread(
		required string label,
		required struct data
		) {

		dump(
			label = label,
			var = data,
			show = "ElapsedTime, Name, Status, QueryMetadata, Error"
		);
		echo( "<br />" );

	}

</cfscript>

As you can see, within each try/catch block, I am calling getQueryMetadata(). This function returns a struct with two pieces of information:

  • The SQL statement that raised the error (if it exists).
  • The other SQL statements that are running in the MySQL database.

Because of the asynchronous nature of the web application, there's no guarantee that the processlist will contain the blocking query - it may already have finished between the time in which the Deadlock found error is thrown and the time in which the processlist is queried. However, at least in this demo, the blocking query shows up most of the time (thanks to the SLEEP() call).

And, when I run this ColdFusion code, I get the following page output:

CFThread scopes that contain the processlist data at (roughly) the time of the transaction deadlock in Lucee CFML 5.2.9.40.

As you can see, 2 of the 3 CFThread tags fail in error. And, in the meta-data that we appended to the thread pseudo-scope as part of the catch block, we can see that the t2 thread was running its SQL query at the time of the error. Again, there's no guarantee that the offending query will even be in the processlist by the time we query it; but, I think this approach is worth a shot!

In our ColdFusion application, we have a centralized error-logger. My plan is to add some code to that logger that checks to see if the given error is of type, database, and references the Deadlock found details. And, if so, I'm going to append the processlist results to the log payload (behind a feature-flag, of course). Then, I just need to wait for one of these rare Transaction deadlocks to occur. Who knows, I might get lucky!

If anyone has any suggestions on how to better debug the, Deadlock found when trying to get lock; try restarting transaction, error, I'm all ears. These don't happen often; and, when they do, they don't block - they rollback immediately; so, I don't think polling the innodb_lock_waits and innodb_trx tables will really help me. But, I assume this is a problem people run into; so, there must be a better way!



Reader Comments

Ben, would you be willing to share some details of the table experiencing deadlocks? What do the indexes look like? What is the query experiencing deadlock doing? I am not an expert on MySQL locking, but if somehow your row lock is getting promoted to a table lock, then all sorts of contention issues can occur on a heavily used table.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.