Right now, I'm trying to muscle my way though the odyssey that is "Designing Data-Intensive Applications". In the book, author Martin Kleppmann discusses database locking; and, posits that applications should really retry Transactions that are timed-out due to a lock. This same sentiment - on retrying transactions - has also been echoed by members of the InVision App data team. As such, I wanted to start thinking about how this might be accomplished. But, in order to restart transactions, I first had to figure out how to force a Transaction lock timeout. And, since I've had recent luck with the MySQL SLEEP() function, I thought maybe I could use SLEEP() along with the database server variable, innodb_lock_wait_timeout, to force lock time-outs in a controlled manner.
To cause a Transaction lock timeout in MySQL, I needed to get two parallel queries to try and update the same InnoDB record at the same time. In order to avoid having to create "production" conditions with load-testing, I'm simply using ColdFusion's CFThread tag to spawn an asynchronous query; then, I'm using the SLEEP() function in that asynchronous query in order to ensure that the UPDATE is held open long enough for a subsequent query in the parent page to create a race-condition.
By default, MySQL will wait 50-seconds for a transaction lock before throwing an error. For testing purposes, this is a painful amount of time. Luckily, this duration is based on the MySQL database variable, innodb_lock_wait_timeout, which can be set at both the GLOBAL and SESSION level. As such, I'm going to be setting the SESSION-scoped value to be 2-seconds:
SET SESSION innodb_lock_wait_timeout = 2; -- Is 50 by default.
Now, let's look at the experiment:
<!--- In order to force a transaction lock timeout, we need to have two parallel queries that are trying to update the same row at the same time. As such, let's spawn a CFThread that initiates a LONG-RUNNING update using Sleep(). ---> <cfthread name="create-race-condition"> <cfquery name="initialUpdate"> UPDATE friend SET isBFF = SLEEP( 10 ) -- Will hang for 10-seconds. WHERE id = 5 </cfquery> </cfthread> <!--- Sleep for a moment in order to ensure the above CFThread is spawned. ---> <cfset sleep( 100 ) /> <cftry> <!--- Now that the above UPDATE is running (long), let's try to update the same row. By default MySQL will wait 50-seconds for a row-lock to timeout. Since I don't want to wait for that, I am using the "innodb_lock_wait_timeout" MySQL session variable in order to force the local lock timeout to be 2-seconds. ---> <cfquery name="conflictingUpdate"> SET SESSION innodb_lock_wait_timeout = 2; -- Is 50 by default. UPDATE friend SET isBFF = 1 WHERE id = 5 ; </cfquery> <!--- CAUTION: This does not work! The above query will throw a "Database" type error; I just wanted to make sure I couldn't catch it with a root-cause error type thrown by the database driver. ---> <cfcatch type="com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException"> <cfdump var="#cfcatch#" label="Java Type" /> </cfcatch> <!--- All database errors are wrapped in a "Database" error type, which we can catch. We can then inspect the Database error details to see what happened. ---> <cfcatch type="Database"> <!--- The database can throw all kinds of errors. In this case, we only want to look at the ones that relate to "restarting" transactions. For this, we can look at error code "40001", which is the "ER_LOCK_DEADLOCK" error, and rethrow any errors that do not match. Read More: https://dev.mysql.com/doc/refman/5.7/en/error-messages-server.html#error_er_lock_deadlock ---> <cfif ( cfcatch.errorCode neq "40001" )> <cfthrow /> </cfif> <cfdump var="#cfcatch#" label="Database" show="detail,errorcode,exceptions,message,nativeerrorcode,sqlstate,type,queryerror" /> </cfcatch> </cftry>
Since I'm expecting the second query to throw an error, I'm wrapping it in a Try / Catch block. In this case, I actually have two CFCatch tags because I wanted to see if I could catch the underlying MySQL error, MySQLTransactionRollbackException. It turns out that I can't because ColdFusion wraps all of the database errors in an exception of type "Database". We can, however, catch errors of type "Database" and then inspect the errorCode to see if the caught error is a transaction timeout error.
And, when we run the above ColdFusion code, we get the following output:
As you can see, we were able to trigger a transaction lock timeout in a predictable manner with ColdFusion and MySQL. And, we were able to catch and inspect that error. Now that I know that I can do this on-demand, I can start to think about how I might design a data-access layer that generically handles and retries transaction lock timeouts (I think I can smell some tasty meta-programming in my future!).
Want to use code from this post? Check out the license.
Now that I can create a lock timeout error under controlled circumstances, I can finally start playing around with automatic retry behaviors for transactions that fail _due_ to a lock timeout error:
I used to think this would be a monumental effort; but, I'm now thinking with some meta-programming techniques, it may not be that hard :D