Learning ColdFusion 8: CFThread Part IV - Cross-Page Thread References
Up until now, we have been examining ColdFusion 8's CFThread tag in the context of a single page or in conjunction with a "set it and forget it" scenario. Now, let's take a look at referencing long running threads across page requests. Remember, since the child thread launched by CFThread may outlive the processing time of its parent, we will have the opportunity to reference a thread that was launched by a previous page request.
To play around with this, we are going to modify our previous photo download demo to use some AJAX. Now, instead of just display the "photos are downloading" message to the user on the confirmation page, we are going to output the status of each photo thread as it updates. This means that after the parent page has finished processing (the confirmation page), we are going to be referencing threads launched by a previous page request. Very exciting.
Before I get into the code, I want to take a second to talk about the THREAD scope. In the demo below, we are telling each CFThread-launched thread to both store itself and then remove itself from the APPLICATION.Threads structure. In doing so, you will notice that I am actually duplicating the THREAD scope within each CFThread tag:
<!--- Store thread reference. ---> <cfset APPLICATION.Threads[ THREAD.Name ] = Duplicate( THREAD ) />
Calling ColdFusion's Duplicate() here is confusing and weird, but absolutely essential. It has to do with the nature of the THREAD scope. The THREAD scope is a new form of scope that we are not used to dealing with. If you dump out the Java class of the THREAD scope, you will see that it is:
Now, I don't actually know anything about this scope, but from my experience, most non-setting references to it result in a NULL value. Therefore, in the example above, if we tried to store the THREAD scope directly into the APPLICATION.Threads struct, we would get this in the APPLICATION.Threads struct dump:
undefined struct element
This is because THREAD will store as a NULL value. To demonstrate this without application-level caching, take a look at a VARIABLES-scoped reference to a thread:
<!--- Run a thread. ---> <cfthread action="run" name="ThreadOne"> <!--- We don't need to do anything in this thread, we just need to know that it was launched. ---> <cfset THREAD.X = true /> </cfthread> <!--- Wait for the thread to finish processing. ---> <cfthread action="join" name="ThreadOne" /> <!--- Output the thread data. ---> <cfdump var="#VARIABLES.ThreadOne#" />
When we try to run that page, we get the ColdFusion error:
Element THREADONE is undefined in VARIABLES.
The thread, ThreadOne should be available in the VARIABLES scope (since ThreadOne is available without scoping). If we CFDump out the VARIABLES scope, we get a crazy looking user defined function that is called:
What the hell is that? I'll tell you what it is - it's a clear demonstration that the THREAD scope is a very special beast.
So, going back to the first example above, when we duplicate the THREAD scope, we are actually converting the THREAD scope into a standard struct representation of its meta data. This will give us an object with a familiar java type:
Doing this, we can now pass around a copy of the thread data that we can actually reference. But this does not mean we have access to the thread itself - just that we have a copy of its meta data.
That being said, let's get back to the demo at hand. It has two parts: the photo download page and then a page that will grab the cached thread data structs and return their status. Here is our modified photo download page:
Notice that each CFThread body starts out by caching itself (by name) in the APPLICATION.Threads struct. Then, as it finishes processing, it removes itself (by name) from the same struct. Technically, this is a place where we might be concerned about race conditions, but for this demo, it will not matter. Also notice in place of the "photos are downloading" message, we now have a P tag that is being updated using jQuery and some simple innerHTML-oriented AJAX.
The page that gets called by the AJAX simply iterates over the APPLICATION.Threads meta data structs and outputs the thread data (to be consumed as innerHTML):
<!--- Kill extra output. ---> <cfsilent> <!--- We are going to build up the thread activity HTML. While I normally would return JSON data here, in order to keep the demo as simple as possible (and since AJAX is not the primary goal here), I am just going to render the innerHTML. ---> <cfsavecontent variable="strThreadData"> <cfoutput> <!--- Check to see if there are any threads. ---> <cfif StructCount( APPLICATION.Threads )> <!--- Loop over the active threads. ---> <cfloop item="strName" collection="#APPLICATION.Threads#"> <!--- Get a short hand reference to the thread. These threads are going to be removing themsleves from the application, so in order to minimize bad data references, get the thread reference. Once we have an independent reference to the thread, it won't matter if it has been removed from the APPLICATION scope. ---> <cfset objThread = APPLICATION.Threads[ strName ] /> <!--- Output the thread data. ---> <strong>#objThread.Name#</strong><br /> ..... Start: #TimeFormat( objThread.StartTime, "h:mm TT" )#<br /> ..... Duration: #NumberFormat( ((Now() - objThread.StartTime) * 86400), "0" )# seconds<br /> </cfloop> <cfelse> <em>There are no active threads.</em> </cfif> </cfoutput> </cfsavecontent> <!--- Output the thread innerHTML. ---> <cfcontent type="text/html" variable="#ToBinary( ToBase64( strThreadData ) )#" /> </cfsilent>
Again, this is a place where we would have to consider race conditions (since we are iterating over a structure that is being modified by parallel threads), but for this demo, I am not going to worry about it. In order to minimize that chance of bad references, I get a short-hand pointer to the thread meta data struct (rather than referencing it throught the APPLICATION.Threads struct). Therefore, even if the struct does get removed, we will still have a valid pointer to it.
Running the above code to download photos, we get the following output:
Your photos are being downloaded right now:
..... Start: 6:30 PM
..... Duration: 1 seconds
..... Start: 6:30 PM
..... Duration: 1 seconds
..... Start: 6:30 PM
..... Duration: 1 seconds
... and then a split second later, after an AJAX update, we get the following update:
Your photos are being downloaded right now:
..... Start: 6:30 PM
..... Duration: 2 seconds
With each successive AJAX call, more of the threads are removing themselves from the APPLICATION.Threads struct. Pretty snazzy, eh? Ok, so we are not technically referencing threads across different page requests, but based on my THREAD-referencing experiments, it seems that this might be the best way to go. Of course, I am just learning here, so it might be that these threads are accessible by name through some other way (but I do not see anything about it in the documentation). At the very least, since we are allowing threads to update their own meta data references, we are tying the meta data copy to the thread across pages.
When it comes to the data in the APPLICATION.Threads struct, remember that it is a duplicate of the thread meta data - it is not actually the thread meta data as it is contained in the running thread. This means that the thread will not update this data as it processes (ie. the Status attribute will never get updated automatically). But, for our purposes, and I assume most purposes, knowing the name and the StartTime will be sufficient.
Want to use code from this post? Check out the license.
So, since the application struct is only copy of the actual threads at the time of their creation I assume you wouldn't be able to view error informtion, or terminate the threads, etc. Is there no way to reference the thread directly after it has been kicked off and the request which created it has finished?
One thing to note -if you want to get the thread data, you CAN get it via Evaluate(). Thats how Adobe documents it.
Thread scopes are not kept in the variable scope and hence you don't see it when you dump the variable scope.
I know you must have figured it out but I am just re-iterating. We don't really recommend sharing the thread scopes across request as this can lead to thread safety of the data.
Unless the running thread itself updates that info, yes, it will not be available since the cached struct is only a copy.
Are you talking about using Evaluate() on a different thread?
I know that when you refer to a unscoped variable, ColdFusion will start searching for it in an orderly fashion through many different scopes (ex. query output, function local, arguments, page variables, etc.) From that I would gather that threads are stored in some scope that is being searched. Is that the case? Or are threads a totally new beast that is a very special implementation?
As for "Best practices", I agree, cross-page references are going to get very hairy, very fast. I would not recommend trying to do it. In this case, however, since I am only ever referencing a copy of the meta data, I feel that it is not as bad as it might sound. Of course, if the thread crashes and cannot remove itself from the app scope, clearly this can become very corrupt, very fast :)
I think it is definitely playing with fire no matter how you look at it, but I think if done correctly / carefully it could have some interesting potential.
Plus, what is that UDF in the VARIABLES scope that I am seeing? What does it do?
The UDF in the variables scope is the actual function that is run in the thread i.e. the code between the cfthread and /cfthread tags are turned in to a UDF and executed in a thread.
Thats correct. Threads are stored in a special scope which is actually a request level scope and is searched when you acess any unscoped variable.
That is good to know. I figure I wouldn't use this stuff like this too often. I am comfortable with a thread setting an application-level variable and then destroying it before it finished executing. Of course, if anything in the thread broke then you would have rogue variables that never get deleted. Clearly, not a fabulous idea, but a cool experiment.
I am running in a shared environment and need to use the thread in order to update a collection, due to the time out restraints. I would like to be able to access the status of the thread so that I know if it completed. In this, very cool, example I never see the final status. It may be naive but how can I wait that last second to see the final status?
I am sorry, I am not sure what you are question is exactly? What are you trying to do with the collection?
I am refreshing the collection after adding new files.
The question is about receiving a status of "Completed" from the thread, or why I am not. I included the status as a part of the information retrieved by the get_Threads.cfm page. Everything works fine but I never receive a status of "completed" just that it is "running". I am sure I am missing something here. Perhaps the reference is killed before I get the "completed" status.
In my example, after the thread finishes executing, it deletes its name from the Application-cache; as such, you will never see that it finishes. It is either running OR it no longer exists (which implies that it has finished).
Right, I just wanted that warm and fuzzy from it saying it finished normally. I do understand. I will just try to modify it to get the information. Thanks. This site rules by the way. I like all the great information.
Thanks :) That is much appreciated.
I just tried this example (CF8.01) and although it works, it only ever downloads the last file requested. All the threads fire off and seemingly
complete" but the only file actually written out is the last one in any list.
Anyone else see this?
Threads might be throwing errors (which won't be apparent in the page unless you wait for the threads to re-join). Try checking your logs or JOIN the threads and look at the thread output.
<cfset APPLICATION.Threads[ THREAD.Name ] = Duplicate(
since you stated that this COPIES the threads meta data does it mean that you will only get the threads last current status since it is a copy or a snapshot or is it a reference which points to the threads instantaneous current meta data which when the status is updated from WAITING TO COMPLETED will be reflected within the thread status?
The Duplicate() should return a totally unique structure; as such, it is static, or rather, not automatically updated by the thread. So yes, it is the status of the thread at the time it was called.
so How do I ensure that I catch status the thread on a status that is either COMPLETED or TERMINATED?
In this example, you don't really. If you look a few lines below the duplicate() call, you'll see that the Thread itself is deleting its own entry from the Application scope. So, the thread both stores and then deletes its own reference. It self-cleans.
So while within a thread that is being spawned can I check for THREAD.status eq "COMPLETED" or "TERMINATED" in other other words when does the status gets changed from "RUNNING" ? Is it before the thread is killed or after? Cause I can not call
<cfset APPLICATION.Threads[ THREAD.Name ] = Duplicate(THREAD) /> after?
The struct in the Application scope won't change. The only thing I am checking here is the existence vs. non-existence (indicating thread started vs. thread ended). I use the existence as the most meaningful property.
What are you trying to do specifically? Perhaps I can come up with a different demo that is more usable for a particular problem space?
I have one thread which we call Susie and another thread which we will call Lisa. I have an application variable like your example which will be order basket. Susie processes a lot of phone orders (request) and places the finish orders and places the order forms in the order basket but I don't want Susie to place an incomplete orders or have Lisa start on a specific order for delivery until Susie is is complete with a specific order. But while Lisa is fulfilling the orders and delivering Susie likes to pile them on top. Oh and sorry Ben none of them are kinky as they are all about business...lol. SO what is the best approach?
<cfargument name="wsXmlString" required="Yes" type="string">
<cfargument name="wsfunctionName" required="Yes" type="string">
<cfset variables.wsMemid = 0>
<cfset variables.strlogoutcall = 0>
<cfset variables.wsfunctionName = arguments.wsfunctionName>
<cfset variables.wsXmlString = arguments.wsXmlString>
<!--- 1. Log xml string in DB--->
<cfset variables.strlogoutcall = logonesiteOutcall(variables.wsMemid,variables.wsfunctionName,variables.wsXmlString,"RemoteCall" )>
<!--- 2. This DB returns a Query object, and i used a QuerytoStruct function to get the returned value --->
<cfset variables.strlogoutcall = QueryToStruct(variables.strlogoutcall)>
<cfset variables.strlogoutcall = variables.strlogoutcall.onesitecalllogid>
<!--- 3. Now i convert the huge xml to s structure so i can easyly work with it --->
<cfset getstr = ConvertXmlToStruct(trim(arguments.wsXmlString),structnew())>
<!--- 4. No i dynamically call another function that processes data based on the funcion name that is called--->
<cfset processvar = "Process"& variables.wsfunctionName >
<cfset cprocess = evaluate("#processvar#(getstr,variables.strlogoutcall)")>
<!---step 5. Return a logid from step 2 back to the caller --->
This is what i am trying to do.
step 1 . When the function is called remotely, I log the call into the DB
Steps2, I get the return from the DB and get the identifier that was returned from the DB.
Step 3, I convert the huge XML that was sent to this function to a struct
Step 4. I pass the xml structure that was returned from 3 above to a dynamic function to go process the XMl etc . (This process takes btw 4-8 seconds)
step 5. I return the logid that was returned in step 2 back to the caller.
what i want to achieve is i want the process to run from step 1 - 5 , but i don't want the initial thread
from this to wait for step 4 to finish before return the log id in step 5.
So i was thinking of wraping a cfthread around step 4, so that the initial thread that was spawned into
this function does not wait on the new cfthread that spawned for step4.
So to the initial thread it looks like the call procession is 1,2,3,5 even though step 4 will be called, but a new thread will be spawned
Do you get my drift?
Sorry I forgot to paste this at the beginning:
<cffunction name="ProcessAll" returntype="numeric" access="remote" output = "Yes" hint="This is used to ProcessUpdate Executive Specific Profile">
I definitely get your drift. I think that makes total sense - putting CFThread around step 4. Are you have success with this approach? Sorry I wasn't sure if this was related to the comments before about the order basket.
Thanks for this - it seems like a bit of a hack to store the data in the application scope, but damn if it doesn't work a charm.
I had a massive timing issue with posting data to a payment gateway and this technique solved the problem.
The thing is, I can't see why this functionality isn't built into cfthread... I'd expect there to be an explicit thread scope that can be accessed from any page to check if threads are running. Seems like quite an oversight really.
Ahh well, this'll do for now... thanks again.
It's an interesting idea to have some sort of built-in, centralized thread aggregation. I could easily see something like CF9's cache methods in the thread area. Sorry, that doesn't make any sense - what I mean is that you can get the list of all cache IDs in CF9 - cacheGetAllIDs(). Would be cool to have something like that for threads:
... that would return an array of all still-running threads.
You can. Use the cfthread scope. Well, it will return ALL threads, not just the ones running, but you can filter it.
CFThread is definitely good when you're on the same page. Just doing some brain-storming about a way to reference threads generated on different pages. I think it's a very outlier type use-case; but, it could be useful.
Heh, I have to say - I have no idea what the rest of this blog entry is - just saw the comment come in via email. ;)
No worries my friend - an old post that got bumped up again.
Hi Ben great article and yes this is several years late but hey that's the timeless beauty of the internet ;)
I'm also running into a situation currently where it would be great to have some kind of THREAD scope i could use to access all running threads and then reuse them.
I've got a system based heavily on 3rd party XML synchronisations of traveller profiles, and there can be up to 10 concurrent threads running all sending Profiles to the external system at a rate of around 120 per minute (10 x 120 = total throughput of profiles per minute). These syncs can be fired from either a scheduled process that runs every 15 mins, or by database triggers or direct page requests within the CF app. It would be great to leave these 10 Named threads always open and dedicated to the profile syncs to optimise resources, however I can't see anyway to do that currently due to page requests coming from all over the system. Currently i just fire off new threads for each sync no matter where it happens in the system which can cause up to 30-40 threads to be created in peak times.
I might try and play around with your Application struct example Ben and see if I can't manage a set of 10 named threads and try and sleep them upon completion of a given task and reuse them when needed.
@JQ (Feb 15, 2010),
That was an old comment of yours, but in case you are still wondering I saw this, too, also with CF8.01 (yes, I'm still stuck with that version!). I commented on the same issue in Ben's previous post of the series (Learning Coldfusion 8: CFThread Part III - Set It And Forget It).
The problem is that all threads use the same strURL variable from the parent code, but this gets updated during the loop that creates the threads themselves, so that in fact by the time the threads actually run they all see the last value of it, download the picture from the same URL and give it the same filename.
The solution is to pass the url as a cfthread attribute, like in:
cfthread (...) myUrl="#strURL#"
and use this attribute inside the thread:
cfhttp url="#attributes.myURL#" (...) file="#GetFileFromPath( attributes.myURL )#"
This way you are saving and using the value strURL had at the moment the thread was created, and not the one it has when it runs.