Learning ColdFusion 8: CFThread Part II - Parallel Threads
Now that we have covered the basics of sending data into and getting data out of ColdFusion 8's new CFThread-launched processing threads, let's examine some places where they can be used. For this post, we are going to concenrate on utilizing CFThread to speed up page processing even if we need all threads to finish processing in the same request.
Often times, while there is a single, overall goal for a page request, that page request is divided up into chunks of code that may be run independently. Take for example grabbing search results from Google.com. Imagine we wanted to grab the first 1,000 search results for his buffness, Vin Diesel, using ColdFusion's CFHttp tag. Google only allows you to grab a max of a 100 results in a single request, so in order to get 1,000 results, it means we have to make 10 sepparate CFHttp requests, each grabbing the next 100 results.
Now, each of those 100 results relates to the overall goal of the page, but does one set of 100 really have anything to do with the next set of 100? Sure, they have to be in some sort of order, but would it even matter which order we made our requests in, so long as the final results were the same?
Absolutely not. But, using traditional ColdFusion code, we have no other option but to make one CFHttp request and then wait for it to finish before making our next CFHttp request. By the very nature of our single-threaded request (well, technically multi-threaded request since CFHttp fires a new process), each CFHttp call is directly tied to the next via processing availability.
This syncronous processing is not as fast as it can be. Take a look at this traditional CFHttp code:
<!--- Build the base URL for the results. This will include everything but the start index. We are going to be screen-scraping Google for some search results. ---> <cfset strBaseURL = ( "http://www.google.com/search?" & "q=Vin+Diesel" & "&num=100" & "&start=" ) /> <!--- Method One: Traditional Syncronous CFHttp calls. This methodology requires that ColdFusion run each CFHttp on its own and then wait for it to finish before firing off the next one. ---> <!--- Get the starting time. ---> <cfset intStartTime = GetTickCount() /> <!--- Let's get the first 1000 results for Vin Diesel. In order to do this, we are going to grab 10 sets of 100 results. ---> <cfloop index="intGet" from="1" to="10" step="1"> <cfhttp method="GET" url="#strBaseURL##((intGet - 1) * 100)#" useragent="#CGI.http_user_agent#" result="objGet#intGet#" /> </cfloop> <!--- Output retrieval times. ---> <p> We Got 1000 Results in #NumberFormat( ((GetTickCount() - intStartTime) / 1000), ",.00" )# seconds using standard CFHttp </p>
If we run the above code a few times, we get the output:
We Got 1000 Results in 4.83 seconds using standard CFHttp
We Got 1000 Results in 4.92 seconds using standard CFHttp
We Got 1000 Results in 12.97 seconds using standard CFHttp
We Got 1000 Results in 4.11 seconds using standard CFHttp
We Got 1000 Results in 7.11 seconds using standard CFHttp
As you can see, the total page processing time took anywhere from 4.11 seconds to almost 13 seconds.
Now that ColdFusion 8 has introduced the new CFThread tag, we can break free of our single-threaded mind-set. No longer does independent code have to wait for other parts of the code to finish processing (as in our above example). In this next example, we are going to wrap each CFHttp call inside of its own CFThread tag. This will allow ColdFusion to launch a new, asycronous thread for each 100 results from Google.com:
<!--- Build the base URL for the results. This will include everything but the start index. We are going to be screen-scraping Google for some search results. ---> <cfset strBaseURL = ( "http://www.google.com/search?" & "q=Vin+Diesel" & "&num=100" & "&start=" ) /> <!--- Method Two: Asyncronous parallel thread CFHttp calls. This methodology leverages ColdFusion 8's new CFThread tag to fire parallel CFHttp calls. ---> <!--- Get the starting time. ---> <cfset intStartTime = GetTickCount() /> <!--- Let's get the first 1000 results for Vin Diesel. In order to do this, we are going to grab 10 sets of 100 results, but this time each grab is going to be done in it's own thread. ---> <cfloop index="intGet" from="1" to="10" step="1"> <!--- Start a new thread for this CFHttp call. ---> <cfthread action="run" name="objGet#intGet#"> <cfhttp method="GET" url="#strBaseURL##((intGet - 1) * 100)#" useragent="#CGI.http_user_agent#" result="THREAD.Get#intGet#" /> </cfthread> </cfloop> <!--- Now, we have to wait for all of concurrent threads to be joined before we can use the CFHttp results. ---> <cfloop index="intGet" from="1" to="10" step="1"> <cfthread action="join" name="objGet#intGet#" /> </cfloop> <!--- Output retrieval times. ---> <p> We Got 1000 Results in #NumberFormat( ((GetTickCount() - intStartTime) / 1000), ",.00" )# seconds using CFHttp and CFThread </p>
Running the above code a few times, we get the output:
We Got 1000 Results in 0.79 seconds using CFHttp and CFThread
We Got 1000 Results in 0.69 seconds using CFHttp and CFThread
We Got 1000 Results in 0.72 seconds using CFHttp and CFThread
We Got 1000 Results in 0.63 seconds using CFHttp and CFThread
We Got 1000 Results in 3.44 seconds using CFHttp and CFThread
As you can see, the page processing time decreased dramatically - usually less than a second in total. So what's with the 3.44 second entry? ColdFusion 8's new CFThread tag requests that a new thread be launched to handle this code; however, the ColdFusion application server does not have an unlimitted number of threads at its disposal. Each CFThread tag requests a new thread. This thread request is then queued for processing. When a processing thread becomes available, it gets passed to the CFThread code for asyncronous processing (also, in our case, processing time is directly tied to the speed of Google.com to return results).
This is very important to undestand. Running parallel threads will only make your page run faster if parallel threads are available to be launched. If you have a server that is maxed out on page requests, wrapping code in CFThread might not have any affect at all (in that case, it might actually have a negative affect since the current page now has to wait for threads... but that is purely an uneducated hypothesis). However, since computers spend like 90% of their time waiting for user requests (at least that is what I hear about Personal Desktop Computers - probably not the same for web servers), it's more likely than not that running parallel threads using ColdFusion 8's CFThread will lead to dramatic page performance increases.
Also notice that after our CFHttp requests, we are explicitly requesting that the parent page wait for all the parallel threads to finish processing (and to join the page). Since these threads are all running in parallel, there is no guarantee that any one thread will have finished processing by the time the parent page reaches a certain line unless you explicitly wait for a named thread to finish. Mental note!
Just remember that since these threads are running in parallel (probably), you must be very careful about making cross-thread references. Unless you explicitly wait for one thread to finish, there is no guarantee that a value set in one thread will be available in another at any given time. And, as always, if you think a negative race condition might apply, please wrap variable access and modification code inside of CFLock tags.
Want to use code from this post? Check out the license.
Don't forget that the "name" attribute can take a list of threads to join. So in your loop code where you're creating the threads, you could append the name of the thread to a list and just do something like:
PS - I've also filed Enhanced #69430 with Adobe to allow the name attribute to be optional, so w/out a "name" attribute, it would just join all threads in the current page template together. :)
Good catch. I forgot to mention that in my run-down. As for your enhancement, that would be awesome! I figure this is a use-case that will be used quite often and would help tremendously.
Just in case you have not noticed, you can set the maximum number of cfthreads that can run parallely in the server, from administrator. That is the max size of thread pool dedicated to run cfthreads. By default it is set to 10 .
Another point to note is that if you specify a large number for it lets say 50, it does not mean all the 50 threads will be running all the time. The pool is dynamic and it adjusts according to the load. So at peak load, it will go upto 50 and when there is no load, it can drop down to 1.
Adobe CF Team.
Thanks for pointing that out. Right now, I am doing most of my testing on HostMySite.com, so I don't have Admin access. I installed the Beta on my desktop at home, but something went screwy with the install, and neither the CFIDE nor the CFDOCS folder seems to have installed in the Coldfusion8 folder. I was running CF7 at the time, so I don't know if that messed it up. I will probably uninstall and re-install or just try to install again.
I can't wait to turn on my per-application-setting so that I can play with the app-specific mappings :)
A quick note for anyone using cfthread on servers with high load running code which creates a large number of threads on multi-core processors...
Even if you increase the 'Maximum number of threads available for cfthread' in CF Administrator, your app may be throttled by JRun. If you're getting threads waiting for no apparant reason, bringing application performance to a crawl, you can try modifying your jrun.xml file.
Open it up and a little way down you'll see this line:
<service class="jrunx.scheduler.SchedulerService" name="SchedulerService">
Increase the numbers for activeHandlerThreads and minHandlerThreads (defaults are 25 & 20 respectively) to a much higher number, save the file, restart CF and try again.
If you get the same results from this change that we did, your application performance will increase massively.
Hope this saves someone all the trouble we've had! :)
Nice tip. I know nothing about messing with the JRUN, but let's just say that a massively popular site is a problem I'd like to have someday :)
When you say "a much higher number", what kind of ballpark are we talking? And having made that change in jrun.xml, what did you then set the "Maximum number of threads available for cfthread" option to? Just match them up?
I can't remember what we used for activeHandlerThreads and minHandlerThreads; the sysadmins deal with all this sort of stuff but I figured as it gave us such a performance gain I'd post it here. I know we started at around 1000 for each but it's no doubt been refined since we first discovered this setting.
The setting in CF admin is something you'll need to tweak yourself as it's going to depend on your hardware and a load of other factors. Try 50 and see how you get on, increasing it as necessary. Just watch the CPU monitor and memory usage so you don't completely kill your machines!
50 is what I'm trying now, so we'll see if it helps. I haven't touched the jrun.xml file at this point, but it is indeed set to the defaults of 25 & 20, so could be a limiting factor.
Will definitely keep an eye on it and tweak more if necessary :)
Thanks for the pointers.
There is some good reading on this on Steven Erat's blog too I just found: http://www.talkingtree.com/blog/index.cfm?mode=entry&entry=942B6F54-45A6-2844-77AD4D08D7523481
Points out how *too* high could have negative effects.
Ben, as always, thanks for the info. I also appreciate the content added by George.
I have a quick question on using the cfthread feature if you do not need to join the threads.
If you just fire off say 25 threads that are performing an action and inserting a record in the database, and the original file is a scheduled task that you have running daily, do you need to join the threads to close the connection?
I have a cfm page that is setup as a scheduled task that checks some monitoring counters and stores them in a transaction table. Using the threads really sped up the processing time, but I was wondering what the impact would be if I just let the threads do their thing. Any gains or losses from not joining the threads?
Thanks in advance!!
You only ever need to join the threads IF you want to check their generated output or thread scopes (or if you are waiting to trigger subsequent work flows). If you don't care what the thread is doing, you can just let it run without joining.
How can we do this in CF7 using CreateObject("java", "java.lang.Thread")?
Every article I find seems to only sleep the thread
I found out to do it in CF7 using Event Gateways (Asynchronous CFML Gateway), thanks to http://www.dcooper.org/blog/client/index.cfm?mode=entry&entry=24BEF3D6-4E22-1671-55146EEC011D18D4
Oh cool - I have never used the ColdFusion gateways; I don't think they are available in "standard".
Here's my scenario. Someone does a query and the result set will be 1200 items. As each item is processed I need to make 8 additional queries. Each thread is totally independent from the other and I do a "join" at the end so I can dump the consolidated output.
My testing indicated that without the threads it takes 240 seconds and with the threads it took 120 seconds. So it takes 1/2 the time and it's about 1/2 second per item. I might have 5 people doing this at once. I'm thinking I'll set the threads to 50. What are the drawbacks to me doing this if any with regards to memory, database connections, etc? What other settings should I be concerned with? What would be a reasonable timeout on the "join", 10 seconds? It's CF 8 Enterprise running inside Weblogic 9.2.
One more thing, is it crazy to create 9600 threads (1200 items x 8 additional queries) for this one page or is it simply not an issue? Each thread is only lasting for perhaps 1/2 second.
I am pretty sure that your version of ColdFusion comes into play here. From what I can remember, ColdFusion Standard won't allow you to create more than 2 parallel threads (the rest get queued). Enterprise, on the other hand, I think will make more - not sure if there is a cap on that.
So, if you are on standard and you try to create a whole large number of threads, the majority of them will be queued and executed whenever the system is freed up. My guess is that is why your algorithm executed in half time rather than something more dramatic like 1/5th time.
OOPS - I just saw that you were running Enterprise. So, disregard the stuff above about a limit of 2. I am not sure what limits exist for Enterprise.
That said, in either case, the system can't simply execute Nth threads in parallel; so, it's gonna have to queue them at some point. The question then becomes - is it bad to have the system queue up a large number of threads?
This is actually a question that I have posed to several engineers on the Adobe development team and I have never gotten a straight answer. Typically they tell me to ask someone else on the team (who I can never find when I need).
CFThreads run as ColdFusion functions (they even have hidden arguments and local scopes). So, my guess is that when you define a thread, it actually gets compiled down to a UDF and then that UDF execution gets queued. So, I would assume that so long as creating lots of classes to represent those UDFs doesn't have a negative impact, then queuing lots of threads shouldn't have a negative impact.
Ok, I know that is long and rambling; at the end of the day, sometimes you just have to try and see if your server starts spitting out smoke ;)
As for timeout, no idea. Trial and error for that one. I am not sure how much you can depend on the math.
Thanks for the quick reply. Pretty funny actually with the 1/5 comment. I actually did at one point get a 1/5 run and I was ecstatic but could never replicate it which totally baffles me. It was literally spot on, 225 vs 45 seconds. Every subsequent run started becoming around 115-125. I'm thinking connections already established, data cached in database, why would this get longer. I have no explanation. I'm happy with about 1/2 the time but why the heck did it more than double from the 45 seconds? It's very obvious to me that as memory usage increases in the server things get slower so I'm thinking that might be it. For instance, I have a CFFLUSH every 8192 bytes. I took that out and the total return time nearly doubled. However, the total output is 11MB so we're not talking crazy memory usage here.
Unless you need the output, wrapping a bunch of the stuff in CFSilent tags of CFSetting/EnableCFOutputOnly should help what I am assuming is a massive amount of white space being produced.
On a somewhat related note, almost every time I've seen code which uses multiple queries like you mention in your statement, I've been able to reduce that code into a single query that is much more efficient.
In most cases you can re-write your code in such a way that all the additional queries are no longer needed. You can use the group attribute on the cfoutput tag to group those one-to-many relationships together.
Also, I'd be looking at the database to see how you can speed things up. It's amazing what you can do by just applying a few indexes here and there. I've seen queries that were taking minutes speed up to fractions of a milliseconds by just correctly setting up the correct indexes on a table.
You may have already done all that, but if not, definitely look at your database to see what you can do to improve the speed performance there. ColdFusion will never make up for a poorly tuned database.
Good point; I remember the first time I applied a database index. I was blown away at the difference it made. It was straight-up bananas.
I loosely used the phrase "additional queries". While there is database activity it's actually somewhat complicated. One "query" is traversing a tree, possibly upwards to determine who a responsible person might be, another is getting a list of people that did the analysis, another is a list of history of the status of the problem, etc. Nothing that could simply be joined in the main query. When you consider that my result set is 1200 items with a total of just under 12MB of data, it's a lot of info and 1/10 second per item is pretty fast. However, I can't fathom why it actually made one run in 45 seconds and all the other runs took about 125 seconds. I did change some of the "subqueries" to use stored procedures and it came down to 115 seconds. However, I still cannot get back to that 45 seconds. I am returning this back to the browser from a server in Switzerland to the U.S. so it's entirely possible that network speed is coming into play here but I made numerous runs in the same 30-60 minutes. I would think I'd get some more variation other than going from 45 to consistently 115-125 seconds. I guess the only way I can truly measure this is to dump the output to a file on the application server to see if I get any variations.
By the way it's an XML file being returned and everything possible is wrapped in CFSILENT so I believe there's pretty much zero unnecessary white space. By the way this is the extreme end of the amount of output. It's more typically about 1/10 the size but we just have some users that just need it all and it's data that changes daily and it's global so I really don't have an "end-of-day".
I am doing data loading into varies tables, XML data, and then some transformation from XMl to Relational table structure.
I have about 10,000 records with an XML column that I want to map to Relational Tables, each thread is used for one of these mappings and it works much faster. Oracle 10g R2 with parallel table option set and indexes built for any query look up.
My question is that for a reasonable number of rows, this works fine. However, for 100,000 rows that I want to run concurrently with cfthread, it will max. out the threads and the queue.
Is there a way for me to control the number of threads used, wait until they are done, before kicking off the next 100 threads. Sort of like a paging of threads. So wait until the first 100 threads have finished and then trigger the next 100 threads.
Any input appreciated...
For something like that, I will typically use a combination of CFThread and perhaps a scheduled task? Or if you don't need to do it all that often, you can have a CFM page that runs a group of threads, re-joins them, then re-loads the CFM with the next "page" of threads.
I used this kind of approach when I deal with thousands of CFHTTP requests (ColdFusion has trouble garbage collecting within in a single request so I needed to refresh the page occasionally to not run out of RAM).
First of all, I would like to say I found the articles posted here very helpful. Thanks.
I have two scenarios that I want to flush out the details but might not have the expertise or the time to do a in-depth investigation, maybe someone reading these articles might have a better approach.
1. For the thread paging I mentioned earlier, is it, hypothetically, possible to have the following impelemntation:
1. variables.thread_counter in the CFM page
2. and each thread updates this counter
3. additional threads will wait or sleep on it if variable.thread_counter is over, say 100
<cfset variables.thread_counter = 0>
<cfloop index="i" from="1" to="1000">
<cfthread action="run" name="thread_#i#" priority="LOW">
--checks for number of threads not over 100
--additional thread to wait or sleep on it until thread counter < 100
--assuming we do not need to join the threads
What happens when we place this code in a component? I have scenarios where making calls to the database in threads resulted in many database connections that still persisted after the process finished...
A little open ended here since I don't have the details.
Any feedback greatly appreciated!
First off, nothing wrong with putting this code in a component - it doesn't change any of the functionality at all.
That said, there's a few issues to touch upon. If you want to do this all in a single page, you could simply loop AND join the threads:
<cfloop (paging loop)>
... thread creation ...
<cfthread action="join" />
</cfloop (paging loop)>
Here, you would be joining an entire "chunk" of threads before the next page could be begin (CFThread/join) will wait untill all outstanding threads launched in the page to return from execution.
That said, it becomes a primary page timeout issue and a memory issue. You can adjust the timeout using CFSetting/requesttimeout to allow for a very long running page. But, even if you do that, I am not sure that you won't simply run out of memory at some point. Like I was saying before, sometimes ColdFusion has trouble garbage collecting mid-page request. As such, launching many many many threads in a single page AND waiting for them to return, might eventually eat up the RAM.
Not 100% sure on that last point though - just something that I have seen in large CFHTTP algorithms.
Does that help at all?
You shouldn't need to 'page' your threads. ColdFusion has a maximum number of threads that it will allow to run at time - the setting is in CF Admin in Server Settings -> Request Tuning (on CF9 anyway; might be elsewhere in 8).
The default for 'Maximum number of threads available for CFTHREAD' is 10 by default. This means that if you create 20 threads, the first 10 created will run and the other 10 will be queued. The queued threads will start running when others terminate.
This is correct; however, the CFThread tags do need to be compiled. And, any information you pass to it gets DEEP copied and stored until the thread executes. I don't know how much memory all of that takes up; but eventually, if you go large enough, I assume you run out of room to store the arguments.
Of course, I have to image that would require a HUGE amount of threads. I've actually tried to discuss this with the Adobe team and I never am able to get a satisfactory answer other than "You probably don't want to do that."
My biggest concern would still be the garbage collection. Of course, that might not be applicable to threads since they do run in parallel. I'll see if I can do some digging on this part specifically.
I'm all too aware of the deep copy issue - it's alright if you're only passing primitives to your threads but I had an issue a while back where someone was passing entire CFCs, unaware that CF was making a deep copy. Needless to say, I had to change that because of memory issues.
Garbage collection is a major factor in massively transactional sites and it's something which has to be set on an application and setup basis. You can configure the garbage collector in the JVM arguments; if you're using threading aggressively, the G1 collector which came in with Java 6 update 14 might improve GC time. Plenty of information about it on the web.
I was one of those people who didn't realize that CFC's were NOT getting passed by reference into threads. That was a fun little journey to debug :)
As far as garbage collecting, I'm not all too strong with the JVM itself - but I'm trying to learn. After a little bit of testing, though, it looks like thread-based garbage collecting seems really strong. I'm doing my darndest over here (right now) to try and stack overflow, but I am having no luck!
I'll get some more details shortly.
I like the paging looping idea you have in your last post. I was trying to keep a counter of the threads currently initiated by the page and asking the main page to sleep on it with a fix interval as the max thread_count is reached. After each thread finishes execution, it must decrement the variables.thread_counter...etc...
<cfloop (paging loop)>
... thread creation ...
<cfthread action="join" />
</cfloop (paging loop)>
On the side, I was not thinking of passing a CFC into a thread, but having the thread logic implemented within a CFC...
Also, I have not done sufficient testing to quantify this, but if looping for 50,000 and creating threads within the loop, I somehow got a max thread reached around 5,000. I don't know if there is a fixed max value for the number of threads, or just depends on memory of the server?
Oh wow - I've never actually seen a "max threads" error :) Well played! This morning, after some of these comments, I was able to produce a Heap Size error with parallel CFThreads launched in a loop... but even that was only occurring with large, pass-by-value arguments at like 200 queued threads.
Never even played with 5,000 threads.
I am new to Coldfusion. Could someone please tell me that how to <cfdump> the value of each thread in the last loop right after they joined.
I have been trying <cfdump var="#Variables['objGet'&intGet]#"> after thread join inside cfloop. But its not working and I am getting following error:
"Element objGet1 is undefined in a Java object of type class coldfusion.runtime.VariableScope. "
Thanks in advance!!
Try dumping out the CFThread scope:
<cfdump var="#cfthread#" />
... this should have all the information for the threads after they have been re-joined to the main page.
Thanks for replying Ben!!
Only mistake I was doing was using 'variable' scope instead of 'cfthread' scope.
<cfdump var="#Variables['objGet'&intGet]#"> by
Yep, that'll do it :) Glad you got it working.
I recently discover some interesting facts while using ColdFusion to extract a huge amount of data from the database, writing to a CSV file.
It seems if we are processing the extraction in one single page request, ColdFusion will hold on to the memory referenced by the variables within the page request. That means, if we are extracting one million records, even in intervals, the memory is consume. Even with an explicit call to handle Garbage Collection, the ColdFusion server will be hard pressed and run out of memory.
However, as I discover from monitoring the memory, by using <cfthread> to extract and process the data, when the thread has completed execution, I can do an Explict Garbage Collection call and the memory usage is back to the base level.
In a way, I think this makes sense since the ColdFusion server can not free up the memory while the page request is still processing, has a handle on the variables within the page. So using the <cfthread> is like calling a page that will release the memory space once the thread completes execution.
Anyone ran into this problem before? Or if you solve this problem differently, care to share?
So how would you display the result of each thread to the user?