After recently demonstrating how to run asynchronous processing with ColdFusion's CFThread tag, I started to think about the nature of the asynchronous threads. It's such a departure from the top-down execution of traditional code that I thought the actual execution of individual threads might not be exactly what people expect. As such, I wanted to take a little time to explore thread execution as well as demonstrate a way in which you can leverage the parallel nature of threads to perform tasks in a serial nature.
First, let's just take a look at how asynchronous threads execute. In this demo, all I am going to do is define 10 CFThreads in a row and then see in what kind of order they start executing:
<!--- Create an empty list of thread indexes to keep track of the order in which threads complete. ---> <cfset completedThreads = "" /> <!--- Create several threads to execute in parallel. ---> <cfloop index="index" from="1" to="10" step="1"> <!--- Launch thread. ---> <cfthread name="thread#index#" action="run" index="#index#"> <!--- Store the index of this thread at the end of the global list so we can see where it executed in order. NOTE: When storing, be sure to use the Variables scope so that the value doesn't get stored locally. ---> <cfset variables.completedThreads = listAppend( variables.completedThreads, attributes.index ) /> <!--- Sleep this thread for a random amount of time. This will help account for the various processing times that each thread might undergo. ---> <cfthread action="sleep" duration="#randRange( 1, 10 )#" /> </cfthread> </cfloop> <!--- Join all the async threads. ---> <cfthread action="join" /> <!--- Output the thread completion order. ---> <cfoutput> Threads: #completedThreads# </cfoutput>
As you can see, as each thread executes, it appends its defining index to a page-scoped list. In this way, we can see the order in which the threads started executing. Within each thread, I am then adding a tiny bit of random delay to simulate the variable-time processing that a thread may require. When we run the above code, we get the following page output:
This page output tells us two very important things: first, the threads do not necessarily execute in the order in which they were defined. You can see here that thread 8 is in the 3rd list position and thread 3 is in the 7th list position. Second, which is perhaps less obvious, is that thread number 7 is missing from the list. This is not because thread 7 failed to executed; rather, it is because the parallel CFThread tags created a race condition around the variable, "completedThreads." This race condition, at some point, created a dirty read, which caused the number 7 to be inappropriately overwritten.
From the above demo, we learned that the execution of parallel threads is unpredictable in order and can cause variable-access race conditions; but, this does not mean that CFThread can only be used to execute completely unrelated tasks. In the next exploration, I'll demonstrate how we can still leverage the asynchronous nature of CFThread tags to perform actions that must be executed in a predefined order.
Imagine that we have a list of images that have to be downloaded using CFHTTP. And, for sake of argument, imagine that these images need to be downloaded in a very particular order. CFHTTP requests are often times the kind of long-processing commands that can get a lot of benefit from parallel threading. But, due to the unpredictable nature of the CFThread tag execution, we have to take a bit more caution when applying it to this kind of a situation.
The trick here is to keep the order-dependant data outside of the CFThread tags. Then, when a particular CFThread tag begins to execute, it must reach up into the primary page variable space to grab the next appropriately-ordered data point. Of course, since multiple threads will be accessing the same data pool, CFLock will need to be employed to prevent dirty reads. By using these two approaches together, the order of the thread execution becomes separated from the order of data processing.
<!--- Base URL for the image. ---> <cfset baseUrl = "http://some-image-domain.com/99933_4c2aea7e44_o.jpg" /> <!--- Create an array of image URLs. Notice that each image URL has a different index - this will be used to track the order in which they execute. ---> <cfset imageUrls = [ "#baseUrl#?i=1", "#baseUrl#?i=2", "#baseUrl#?i=3", "#baseUrl#?i=4", "#baseUrl#?i=5", "#baseUrl#?i=6", "#baseUrl#?i=7", "#baseUrl#?i=8", "#baseUrl#?i=9", "#baseUrl#?i=10" ] /> <!--- Create a list of completed URLs indexes so we can see what order the images were downloaded in. ---> <cfset completedImages = "" /> <!--- Create a list of thread indexes that have executed. This is so we can compare the order of execution to the order of image downloads. ---> <cfset completedThreads = "" /> <!--- Create enough threads to asynchronously download the images; but, don't actually pass any image value into the thread. ---> <cfloop index="threadIndex" from="1" to="#arrayLen( imageUrls )#" step="1"> <!--- Launch a parallel thread. ---> <cfthread name="thread#threadIndex#" action="run" index="#threadIndex#"> <!--- Because we want these threads to process in parallel, but in a serial order, let's get the next available URL in the collection. Since we are creating a race condition here, we need to lock this access. ---> <cflock name="urlAccess" type="exclusive" timeout="60"> <!--- Get the next image URL. ---> <cfset imageUrl = variables.imageUrls[ 1 ] /> <!--- Since we don't want anyone else to access this image URL, let's delete it from the array. ---> <cfset arrayDeleteAt( variables.imageUrls, 1 ) /> <!--- Add the index of the executed thread. ---> <cfset variables.completedThreads = listAppend( variables.completedThreads, attributes.index ) /> </cflock> <!--- Download the image at the URL. ---> <cfhttp method="get" url="#imageUrl#" getasbinary="yes" /> <!--- Now that the image has been completed, add the url index to the complete list. NOTE: This list mutation is not using CFLock, but it probably should. I am only excluding it here because this is not the focal point of the code demo. ---> <cfset variables.completedImages = listAppend( variables.completedImages, listLast( imageUrl, "=" ) ) /> </cfthread> </cfloop> <!--- Rejoin all threads. ---> <cfthread action="join" /> <!--- Output the list of completed image URL so that we can see in which order they executed. ---> <cfoutput> Threads: #completedThreads#<br /> <br /> Images: #completedImages# </cfoutput>
As you can see here, at the start of each ColdFusion CFThread tag execution, there is a named CFLock tag. Within this exclusive lock, the thread reaches up into the primary page and pops off the next available image URL from the imageUrls array. It also adds its thread index to the completedThreads variable. In this way, we allow the CFThreads to execute in parallel, but still require the data in the imageUrls array to be processed in serial. And, when we run the above page, we get the following output:
As you can see, the threads executed in an unpredictable order but, the images were downloaded in sequence.
By single-threading the data access while allowing the heavy-lifting (CFHTTP) to be done in parallel, we really obtain a "best of both worlds" outcome. Of course, the way in which this can be applied to your software is going to depend largely on the rules pertaining to your data; but, I hope this demonstrated that serialized data can be processed in a parallel manner under the right conditions.
Want to use code from this post? Check out the license.