Understanding Transactions And Database Rollbacks That Involve 3rd Party API Integration
ColdFusion's CFTransaction tag is pretty awesome - you just wrap up a bunch of database interactions in the CFTransaction tag and any error thrown by any of the interactions automatically rolls-back the entire set of database mutations. But, what if part of your database interaction workflow involves calling a 3rd party API? In that case, you can't simply rely on the inherent behavior of the CFTransaction tag; rather, you have to perform both the local and remote cleanup yourself.
This is exactly the type of situation that Ryan Jeffords and I were dealing with last week. We had an application that created a user in our local database and then created a payment processing account and subscription plan in a remote payment gateway. The workflow was such that the local user record needed to be created before the remote payment account could be requested. The only problem was that the last step of the payment gateway integration could fail if the supplied credit card was invalid. In such a case, we needed to rollback not only the local database but also the remote payment gateway.
The linear processing looked a bit like this:
- LOCAL - Create user record.
- REMOTE - Create payment account.
- REMOTE - Create payment subscription (this could FAIL).
If step #3 failed, we needed to roll back the remote mutations in step #2 and the local mutations in step #1.
To do this, we needed to use more than just the CFTransaction tag; we needed to use a whole CFTransaction / CFTry / CFCatch / CFReThrow tag workflow. To demonstrate, I have set up a little simulation that involves both local and remote API calls:
<!--- Start our transaction. This workflow will include an update to a local database AND an update to a remote, non-database API. ---> <cftransaction> <!--- Inside the transaction, we are going to wrap our database interaction in a try/catch. This is because we might need to do a partial recovery from any failures. Typically, we could just leave this up to the inherent nature of the CFTransaction tag; however, since we are dealing with 3rd party APIs, we have to do a little manual labor. ---> <cftry> <!--- Insert the user into the local database. ---> <cfset userID = insertUser() /> <!--- Now that we have created the local user, let's create the payment processing account and scheduling in the 3rd party API. ---> <cfset accountID = createPaymentAccount( userID ) /> <!--- Set up the recurring payment. ---> <cfset createPaymentSubscription( userID, accountID ) /> <!--- Catch any errors that have occurred. These may be either from the databse or the 3rd party API. Since the error may have occured in a remote location, we need to do both local AND remote clean-up. ---> <cfcatch> <!--- Something has gone wrong with the user creation. Start off by rolling back the database. ---> <cftransaction action="rollback" /> <!--- Now that we've rolled-back the database updates (local cleanup), let's clean up the 3rd party API. More likely than not, this is where the error has come from to begin with. Since we aren't sure where in the process the remote API interaction failed, wrap the clean-up itself in its own Try/Catch. This way, if part of the 3rd party API fails (ie. we try to delete something that wasn't created), we can still let the rest of the clean-up process take place. ---> <cftry> <!--- Try to delete the payment account. ---> <cfset deletePaymentAccount( accountID ) /> <!--- Try to delete the payment subscription. ---> <cfset deletePaymentSubscription( userID, accountID ) /> <!--- Catch any API errors. ---> <cfcatch> <!--- One of the 3rd party API calls failed. There's nothing more we can do about that. ---> </cfcatch> </cftry> <!--- Now that we've handled both the local database rollbacks and the remote 3rd party API rollbacks, let's re-throw the error so that it can bubble up the application stack to a point that can use it to communicate back to the user. ---> <cfrethrow /> </cfcatch> </cftry> </cftransaction> <!--- Woohoo! User was created. ---> Groooovy! User was created!
Because our clean-up involves both local and remote locations, we need to manually execute the rollback. This gives us a chance to interact with the 3rd party API before re-throwing the error, allowing it to bubble back up the application stack (where it would, presumably, be used to let the client know that a problem has occurred).
Want to use code from this post? Check out the license.
...and it works quite nicely might I add! :) Excellent (and simple) solution.
Heck yeah :)
Yes, very nice.
There was a push for a generic approach to this problem back in the '90s called "2-Phase Commit". It wouldn't have worked in this case, because it requires that all databases you're using have to support it. In other words, not only your local database, but also your remote system's API, would have to have special commands to support committing in 2 different contexts.
Because you love knowing things, here's the overview:
I've never even heard of that. Sounds like it could be interesting. I wonder how a remote database could even possibly support such a two-phase commit. I suppose there would have to be an explicit commit on all mutations.
... which is probably all outlined in that Wikipedia link :)
Ben, you also might want to look into XA, which is a modern implementation of the two-phase commit protocol.
Is this kind of distributed transaction model typically supported by APIs? Or is this more of a theoretical approach (in a perfect world...) kind of thing?
You can definitely implement as part of your API. The main issue, though, is that all of the distributed resources that make up your API (database, web service, etc.) have to support distributed transactions - some commercial products do, and some don't.
You also might want to read up on JTA:
Hopefully I'm not sending you too far down a new rabbit hole!
The problem that 2-Phase Commit was intended to address is synchronization of separate, autonomous systems that each have their own rollback/commit mechanisms. They could reside on the same computer or different computers. They could be databases or messages ("book an airplane ticket").
How do they talk to each other such that transactions distributed across those multiple systems will either commit as a unit or rollback as a unit?
If each of the autonomous systems supports 2-Phase Commit, and you adhere to the commit sequence, no problem.
Rob's posts made me more aware of the progress since the mid '90s. So in response to your questions, yeah, it's usable now, provided that the products you're using support it.
I hope you understood the reasoning behind integrating messages into the transaction environment. If "book an airplane ticket" is part of the same transaction as billing a credit card (triggered by the code handling the message), and the credit card database goes down, the message can be rescheduled as soon as the database comes back up. That would be different from the credit card being rejected, which commits to a message response of rejected, the credit card doesn't get debited and the message doesn't get rescheduled. Only actually unprocessed messages get rescheduled.
I used airline tickets as an example, because that's exactly the kind of high-volume distributed transaction that needs to integrate message recovery together with database recovery. Different airlines. Different credit cards. Millions of dollars a day at stake.
This applies to a case that I'm working on now where a centralized inventory table is inserted/updated through an autonomous transaction since this may be updated for different operations such as procurement/sale/return. So if an order has multiple items and the update fails for a subsequent item, i would have to do a manual undo for the previous items since the autonomous transaction would have committed the previous items.
Aaah, these situations are always fun. One thing you may not have taken into account above is if *YOUR* server (in addition to the 3rd Party's) server/request crashes between the different remote requests. One way I've dealt with such situations when integrating with 3rd parties was to create a work queue.
So when a user submitted a form that had its data processed locally AND by 3rd party APIs I would save the form information to a database table that had a bunch of flags representing each step in the process/transaction to be completed.
Then I would kick off the processing of the tasks. The task manager would execute each of the tasks in sequence updating the database fields with the result of each task.
If a task failed, I could undo each of the previous tasks in a controlled / persistent fashion. Sometimes a task would soft-fail and I'd be able to re-try that step again and potentially finish processing the entire queue still. After "x" soft fails I would hard-fail.
I didn't realize this when I coded my implementation, but the concept I was trying to code was similar to the command pattern. If I were to code something like this again today, I'd probably go the command pattern route and make the handling of work queues more generic.
The thing I like about the work queue is it was PERSISTENT. So the first database transaction in processing the form is used to save the work queue.
Then you call a task manager (outside the original transaction) which processes the queue and each task gets its own transaction to update its state with. There's still the possibility that a task result doesn't get logged to the database/persistent store but it's pretty unlikely.
Great post. Properly coding for situations like you describe above is often overlooked and developers aren't always taught how to deal with them. Glad you're calling attention to it.
I see it's been a while since this post was active with comments, but I 've been referring to this article and discussion for the past few weeks while trying to build in some failsafes on a CF scheduled task.
Basically, we query the database, and immediately hit the 3rd party gateway API to make a CC or ACH charge.
We had a database fail the first week of the year which caused numerous double charges, as the DB didn't get updated with the response of the first transactions.
Since recording(updateing) that response is the key to failsafe against duplicate charges, how would you approach the problem.
I've started diagramming Flag points in the process, and even considered Inserting the initial query results into a unique table to keep track of them and their stage in the process.
I would love to get some input from some CF pros like you and your audience Ben.
Thanks for the great resource.