Graceful ColdFusion Timeout Disaster Recovery (Thanks Barney Boisvert)
Previously, I blogged about how hard it is to recovery gracefully from ColdFusion request timeout exceptions. The big problem is that after a particular tag or algorithm times out, even if you CFCatch the thrown exception, you simply don't have any processing time to do anything with the error object. Once a page times out, you have about 16-30 milliseconds to perform recovery actions before the thread actually craps out. That pretty much excludes any kind of CFMail or CFDump action (and if you want to log anything to a file, just forget about it!).
In the brief post-timeout period, I demonstrated that you could simply set a higher request timeout in the ColdFusion CFSetting tag. However, unless your page timeouts are always the same, this was pretty much hit or miss. Barney Boisvert just dropped a bomb shell on me last week, demonstrating how to get the current request timeout from ColdFusion Request Monitor.
When I saw that, I immediatly knew it could be leveraged for this task. Using Barney's tip, it now becomes super easy to gracefully recover from a ColdFusion timeout exception. First, we need to create a simple ColdFusion user defined function that encapsulates the getting of the current request timeout; after all, I don't necessarily want to remember how to do this:
<cffunction name="GetRequestTimeout" access="public" returntype="numeric" output="false" hint="Returns the current request timeout for the current page page request."> <!--- Define the local scope. ---> <cfset var LOCAL = StructNew() /> <!--- Get the request monitor. ---> <cfset LOCAL.RequestMonitor = CreateObject( "java", "coldfusion.runtime.RequestMonitor" ) /> <!--- Return the current request timeout. ---> <cfreturn LOCAL.RequestMonitor.GetRequestTimeout() /> </cffunction>
Once we have that in place, we can update our disaster recovery methodology to add just a few seconds to the current request timeout once a ColdFusion timeout exception has been thrown:
<cffunction name="KillTime" access="public" returntype="void" output="false" hint="I kill time for the given miliseconds."> <!--- Define arguments. ---> <cfargument name="MS" type="numeric" required="true" /> <!--- Get start and end tick out values. ---> <cfset var intStart = GetTickCount() /> <cfset var intEnd = (intStart + ARGUMENTS.MS ) /> <!--- Loop until this time is killed. ---> <cfloop condition="(GetTickCount() LT intEnd)"> <!--- Just try to kill some processing time. ---> <cfset intStart = Sqr( intStart * Pi() * GetTickCount() ) /> </cfloop> <!--- Return out. ---> <cfreturn /> </cffunction> <!--- ::: DEMO CODE ::: ---> <!--- Set the current time out to be 2 seconds. ---> <cfsetting requesttimeout="2" /> <!--- Get the millisecond start time for page processing (so that later on, we can check to see how long the page ran overall). ---> <cfset intStart = GetTickCount() /> <!--- Try to kill some time. ---> <cftry> <!--- Here, we are killing time - 4 seconds to be approximate. This will exceed the request time out set above (2 seconds) and will throw an error. ---> <cfset KillTime( 4000 ) /> <!--- The KillTime() method call has timed out. ---> <cfcatch> <p> First Timeout! </p> <!--- Now that our page has timed out, we need to add more time to the request in order to recover gracefully. Add a few seconds onto the request timeout that caused this exception. ---> <cfsetting requesttimeout="#(GetRequestTimeout() + 3)#" /> <!--- Now that we have a little more wiggle room, let's email ourselves the caught exception. ---> <cfmail to="email@example.com" from="firstname.lastname@example.org" subject="Kinky Solutions Timeout Error" type="html"> <p> The following error was thrown on #DateFormat( Now(), "mmm d, yyyy" )# at #TimeFormat( Now(), "hh:mm TT" )# </p> <!--- Dump out error. ---> <cfdump var="#CFCATCH#" label="Kinky Solutions Exception" /> <!--- Dump out CGI object. ---> <cfdump var="#CGI#" label="CGI Struct" /> </cfmail> </cfcatch> </cftry> <cfoutput> <p> Total Time: #(GetTickCount() - intStart)# </p> </cfoutput>
Notice that in our CFCatch tag, once we know that the function, KillTime(), has timed out, we are not playing hit-or-miss with the new CFSetting tag. In fact, we are making very sure to only add three more seconds to the page processing. In that time, we are CFMailing ourselves the error as well as the user's CGI object.
Running the above page, we get the following output:
Total Time: 2078
Notice that after the first request timeout exception was thrown (First Timeout!) we were able to continue processing the CFMail tag. All in all, it took 78 milliseconds to run the additional CFMail / CFDump scripts. This is the kind of disaster recovery that we would NOT have had time to do if we didn't mess with the page's request timeout.
Oh, and incidentally, I had the pleasure of meeting Barney in person at CFUNITED 2007. Very cool guy - seemed way smarter than myself. However, I am embarrassed to say that I was totally mispronouncing his name. Apparently, it is a name of french descent and is pronounced "Bo-v'air". Sorry Barney :)
Want to use code from this post? Check out the license.
Nice! Been wondering how this would be possible.
Yes, Barney is just a great guy as well as being a friggin' genius on just about any topic. And don't feel too bad about mispronouncing his name, I think everyone does at first. ;-)
Very cool. I can use this :-)
I think I would like to make this standard place in my applications as part of my catch-all error handling. Just add one second and you should have enough time to take care of all the logging and possible emailing. I think it's just good practice.... anyone see any red flags (and remember and error that goes of inside of the OnError() event method will NOT cause infinite error handling)?
This is fantastic! I see that this technique works on CF7-- this isn't limited only to the Enterprise version of CF, is it? (I ask because Enterprise is all I can currently test on.)
I can only test on Standard, so it certainly not limited to Enterprise.
Saw this post on the ColdFusion Weekly podcast del.icio.us link roll.
I just incorporated this idea into the error plugin file for my Fusebox 5 application, and it works like a charm! Thanks for the info!
@Ben: Good to know that it works on Standard as well as Enterprise, and CF7 as well as CF8. (Anyone care to test it out on CF6, for the heck of it?)
Could something be worked into the onError function for app.cfc?
<cfargument name="Exception" required="true" />
<cfargument type="String" name="EventName" required="true" />
<cfif arguments.Exception.rootCause eq "coldfusion.runtime.RequestTimedOutException">
<cfsetting requesttimeout="#(GetRequestTimeout() + 3)#"/>
What I've started to do is actually just set the CFSetting RequestTimeOut to a high number in the OnError method. I no longer bother with checking the existing value - I just set it to something like 10 minutes since I know that nothing in my OnError will hang.
thanks. I have just added this to our global error-handling as well as our Site-wide Error Handler template, and it works like a charm :D
Awesome! Over time, I have found that setting RequestTimeout to a high number in the error handler is as effective and easier to implement than worrying about what the actual existing timeout is. But, either way, rock on!
I was tackling a similar problem and have incorporated the same method as Richard, sticking the request extender in the onError function. It works pretty darn well up to this point.
Awesome my man. I have occasionally run into a few cases where this didn't work. When it doesn't work, I simply default to an arbitrary long timeout:
<cfsetting requesttimeout="#(60 * 5)#" />
... this gives us a 5-minute timeout. It's not nearly as clever, but it works well too :)
Does anyone know the maximum requesttimeout you can use in cfsetting?
I have a process that will take about 50 hours from end to end...to finish...any ideas?
Does anybody know if this is a non-issue in newer versions of ColdFusion? This strikes me as a basic flaw inside CF itself, that may have been fixed by Adobe since 2007. CF itself should ensure catch code can continue to run on a timeout error.
It would be good to be able to allow processing to continue for a second once we hit the timeout mark. This would take care of those requests you really need to run, and usually would run if it hadn't been for some unexpected load on the server.
Ben, you have done it again! Thanks for this post! I have been scratching my head on how to handle the request timeout error and using the cfsetting in onError works like a charm. I never would have thought of that. Thanks for posting!
I thought it was a joke - i hit this link from Google, and all I got was "An error occurred".
what are the chances of that ey?