After recently demonstrating how to run asynchronous processing with ColdFusion's CFThread tag, I started to think about the nature of the asynchronous threads. It's such a departure from the top-down execution of traditional code that I thought the actual execution of individual threads might not be exactly what people expect. As such, I wanted to take a little time to explore thread execution as well as demonstrate a way in which you can leverage the parallel nature of threads to perform tasks in a serial nature.
First, let's just take a look at how asynchronous threads execute. In this demo, all I am going to do is define 10 CFThreads in a row and then see in what kind of order they start executing:
<!--- Create an empty list of thread indexes to keep track of the order in which threads complete. ---> <cfset completedThreads = "" /> <!--- Create several threads to execute in parallel. ---> <cfloop index="index" from="1" to="10" step="1"> <!--- Launch thread. ---> <cfthread name="thread#index#" action="run" index="#index#"> <!--- Store the index of this thread at the end of the global list so we can see where it executed in order. NOTE: When storing, be sure to use the Variables scope so that the value doesn't get stored locally. ---> <cfset variables.completedThreads = listAppend( variables.completedThreads, attributes.index ) /> <!--- Sleep this thread for a random amount of time. This will help account for the various processing times that each thread might undergo. ---> <cfthread action="sleep" duration="#randRange( 1, 10 )#" /> </cfthread> </cfloop> <!--- Join all the async threads. ---> <cfthread action="join" /> <!--- Output the thread completion order. ---> <cfoutput> Threads: #completedThreads# </cfoutput>
As you can see, as each thread executes, it appends its defining index to a page-scoped list. In this way, we can see the order in which the threads started executing. Within each thread, I am then adding a tiny bit of random delay to simulate the variable-time processing that a thread may require. When we run the above code, we get the following page output:
This page output tells us two very important things: first, the threads do not necessarily execute in the order in which they were defined. You can see here that thread 8 is in the 3rd list position and thread 3 is in the 7th list position. Second, which is perhaps less obvious, is that thread number 7 is missing from the list. This is not because thread 7 failed to executed; rather, it is because the parallel CFThread tags created a race condition around the variable, "completedThreads." This race condition, at some point, created a dirty read, which caused the number 7 to be inappropriately overwritten.
From the above demo, we learned that the execution of parallel threads is unpredictable in order and can cause variable-access race conditions; but, this does not mean that CFThread can only be used to execute completely unrelated tasks. In the next exploration, I'll demonstrate how we can still leverage the asynchronous nature of CFThread tags to perform actions that must be executed in a predefined order.
Imagine that we have a list of images that have to be downloaded using CFHTTP. And, for sake of argument, imagine that these images need to be downloaded in a very particular order. CFHTTP requests are often times the kind of long-processing commands that can get a lot of benefit from parallel threading. But, due to the unpredictable nature of the CFThread tag execution, we have to take a bit more caution when applying it to this kind of a situation.
The trick here is to keep the order-dependant data outside of the CFThread tags. Then, when a particular CFThread tag begins to execute, it must reach up into the primary page variable space to grab the next appropriately-ordered data point. Of course, since multiple threads will be accessing the same data pool, CFLock will need to be employed to prevent dirty reads. By using these two approaches together, the order of the thread execution becomes separated from the order of data processing.
<!--- Base URL for the image. ---> <cfset baseUrl = "http://some-image-domain.com/99933_4c2aea7e44_o.jpg" /> <!--- Create an array of image URLs. Notice that each image URL has a different index - this will be used to track the order in which they execute. ---> <cfset imageUrls = [ "#baseUrl#?i=1", "#baseUrl#?i=2", "#baseUrl#?i=3", "#baseUrl#?i=4", "#baseUrl#?i=5", "#baseUrl#?i=6", "#baseUrl#?i=7", "#baseUrl#?i=8", "#baseUrl#?i=9", "#baseUrl#?i=10" ] /> <!--- Create a list of completed URLs indexes so we can see what order the images were downloaded in. ---> <cfset completedImages = "" /> <!--- Create a list of thread indexes that have executed. This is so we can compare the order of execution to the order of image downloads. ---> <cfset completedThreads = "" /> <!--- Create enough threads to asynchronously download the images; but, don't actually pass any image value into the thread. ---> <cfloop index="threadIndex" from="1" to="#arrayLen( imageUrls )#" step="1"> <!--- Launch a parallel thread. ---> <cfthread name="thread#threadIndex#" action="run" index="#threadIndex#"> <!--- Because we want these threads to process in parallel, but in a serial order, let's get the next available URL in the collection. Since we are creating a race condition here, we need to lock this access. ---> <cflock name="urlAccess" type="exclusive" timeout="60"> <!--- Get the next image URL. ---> <cfset imageUrl = variables.imageUrls[ 1 ] /> <!--- Since we don't want anyone else to access this image URL, let's delete it from the array. ---> <cfset arrayDeleteAt( variables.imageUrls, 1 ) /> <!--- Add the index of the executed thread. ---> <cfset variables.completedThreads = listAppend( variables.completedThreads, attributes.index ) /> </cflock> <!--- Download the image at the URL. ---> <cfhttp method="get" url="#imageUrl#" getasbinary="yes" /> <!--- Now that the image has been completed, add the url index to the complete list. NOTE: This list mutation is not using CFLock, but it probably should. I am only excluding it here because this is not the focal point of the code demo. ---> <cfset variables.completedImages = listAppend( variables.completedImages, listLast( imageUrl, "=" ) ) /> </cfthread> </cfloop> <!--- Rejoin all threads. ---> <cfthread action="join" /> <!--- Output the list of completed image URL so that we can see in which order they executed. ---> <cfoutput> Threads: #completedThreads#<br /> <br /> Images: #completedImages# </cfoutput>
As you can see here, at the start of each ColdFusion CFThread tag execution, there is a named CFLock tag. Within this exclusive lock, the thread reaches up into the primary page and pops off the next available image URL from the imageUrls array. It also adds its thread index to the completedThreads variable. In this way, we allow the CFThreads to execute in parallel, but still require the data in the imageUrls array to be processed in serial. And, when we run the above page, we get the following output:
As you can see, the threads executed in an unpredictable order but, the images were downloaded in sequence.
By single-threading the data access while allowing the heavy-lifting (CFHTTP) to be done in parallel, we really obtain a "best of both worlds" outcome. Of course, the way in which this can be applied to your software is going to depend largely on the rules pertaining to your data; but, I hope this demonstrated that serialized data can be processed in a parallel manner under the right conditions.
good info on cfthread behavior. i have a quick question..
Can a webservice call from coldfusion can be invoked asynchronously? I.e just fire a webservice call from the page but make sure it won't impact the performance of the remaining page processing.
You have two ways to go about this. You can either wrap it in a CFThread tag so the web service request gets called in parallel. Or, you can invoke the web service using CFHTTP and give it a timeout="1" (one second) and throwOnError="false". In doing so, the page will only pause a max of one second waiting for a response from the web service. You DO get a slight pause; but, if you are pre-CF8 and do not have the CFThread tag, this is a very nice alternative.
thanks for the response. for some reason, i wasn't feel comfortable using cfthread due to unpredictable nature of it. but after seeing your post, i decide to use it. thanks
i have another question though..
as you know CF provides debug output of session,cgi and application scope variables at the end of the page. How can we enable the drilldown of complex data types in the debug section of the page for those variables? Currently it just merely specifies the number of keys it holds.
Hmmm, I am not sure. To be honest, I have not used the output at the bottom of the page in a long, long time. Rather, what I have started to LOVE LOVE LOVE is just using the CFDump tag. If you are on CF8+, the CFDump tag can write to a file:
output="./local/file/path | full/file/path"
I have found this to be great. Plus, you can throw it in a CFThread tag, which executes asychronously and it works perfectly.
Great article! I use CFTHREAD pretty regularly on processes that I know I can work with concurrently and most of the time will not cause any race conditions. However, I came across an issue today when I do up to 10 asynchronous CFFTP transactions for uploading files. I do checks to make sure the dorectory doesn't exist then create it, check for file existance then upload and it works perfect except for about once every week where I get an exception that the directory already exists. I can only assume thread 1 checked for the directory and id didn't exist and before it goes to create it, another thread is a step ahead of it and creates it after it's been checked for existance.
I've used locks on shared scopes before, but not necessarily named synchronization blocks. Hopefully this example here will solve my issue:
<!--- Create the folder if needed --->
<cflock name="lckCreateDir" type="exclusive" timeout="10">
<cfftp connection="ftpConn" action="existsDir" stopOnError="false" directory="#destDir#">
<cfif NOT cfftp.returnValue>
<cfset thread.result.logData.add("Directory #destDir# doesn't exist - creating directory for MatNO #matNO#")>
<cfftp connection="ftpConn" action="createDir" stopOnError="false" directory="#destDir#">
I guess if after 2 weeks and no error I will assume I've implemented it correctly! :)
The use of <cfthread action="join" /> is incorrect. It requires a name attribute for the threads which require to be joined.
If you omit the 'name' attribute, it waits until all threads are finished. The attribute is only required if you want to join specific threads.
Using your example code gives me a syntactical error on CF 9 application server.
I have used this in my code, thanks.
But, i have 1 problem with it.
In the lock you get the url of the image to be downloaded.
These are picked up by each thread in a serial manner.
Then you go out of the lock, and execute a cfhttp.
That cfhttp could potentially take longer then normal (say 10seconds),
The next thread with the next url might download the image much faster, and then you have the images out of order, so you have to put the lock around the cfhttp to i think.
In this case you could also just use the index of the url array and execute threads in a normal way and then put the image in the array at index.
Actually, you have to make a lock with another name and put all the code in the thread in it, so only 1 of those threads at 1 time executes.
This has the disadvantage that downloading also occurs in serial manner, but still has the advantage that the main http/web thread can continue, which was good enough for me.