A Serious CFThread Bug In ColdFusion?
NOTE: This is not really a bug. This is a side-effect of ColdFusion 8's new deep copy functionality. In ColdFusion 8, sadly, deep copy will also deep copy CFCs.
Yesterday, I spent almost an hour trying to debug a problem with KinkyTwits, my ColdFusion Twitter client. After almost pulling all of my hair out, I finally realized that I wasn't fighting a bug in my code but rather a bug in ColdFusion itself. My KinkyTwits code is quite complex so this morning, I boiled it down to the most mundane example. I was a little nervous that it wasn't actually a bug, but even in this simple example, the problem is easily repeatable.
First, I created a small ColdFusion component that internally caches an integer and then increments it with an Increment() method call:
<cfcomponent output="false">
<cffunction
name="Init"
access="public"
returntype="any"
output="false"
hint="I return an intialized variable.">
<!--- Configure the variables. --->
<cfset VARIABLES.Instance = {
Index = 0,
LogFile = (
GetDirectoryFromPath( GetCurrentTemplatePath() ) &
"log.txt"
)
} />
<!--- Return This reference. --->
<cfreturn THIS />
</cffunction>
<cffunction
name="Increment"
access="public"
returntype="any"
output="false"
hint="I increment the internal index in a thread-safe way and log value to file.">
<!---
Single-thread this action. Because this incrementing
a shared scope (and because we know it will be used
in a CFThread tag) we want to make sure multiple
calls to this cannot possible corrupt each other.
--->
<cflock
name="Incrementor.Increment"
type="exclusive"
timeout="5">
<!--- Increment internal value. --->
<cfset VARIABLES.Instance.Index++ />
<!--- Log value. --->
<cffile
action="append"
file="#VARIABLES.Instance.LogFile#"
output="#VARIABLES.Instance.Index#"
addnewline="true"
/>
<!--- Return This scope for method chaining. --->
<cfreturn THIS />
</cflock>
</cffunction>
</cfcomponent>
As you can see, this is very straightforward. The internal value is cached during ColdFusion component initialization and then incremented with each Increment() method call. Inside the Increment() method call, I am single-threading (using an exclusive, named CFLock around) the actions that read from and update the internal index. While I don't think CFLocking is necessary in my scenario, since I am going to be using CFThread later on, I figured it would cut down on possible causes of the erratic behavior.
Once I had this ColdFusion component ready to go, I created a very simple Application.cfm file so that I would have a persistent scope in which to cache my Incrementor.cfc instance:
<!--- Define application. --->
<cfapplication
name="CFThreadBugTest"
applicationtimeout="#CreateTimeSpan( 0, 0, 5, 0 )#"
/>
Then, I created a small test page that created and cached an Incrementor.cfc instance in the APPLICATION scope:
<!---
Create an instance of the incrementor and cached it
in the APPLICATION scope. We are doing this only to make
sure that the "request" is not having some weird time-out
issue with the CFThread.
--->
<cfset APPLICATION.Incrementor = CreateObject(
"component",
"Incrementor"
).Init()
/>
<!--- Call the incrementor three times. --->
<cfset APPLICATION.Incrementor.Increment() />
<cfset APPLICATION.Incrementor.Increment() />
<cfset APPLICATION.Incrementor.Increment() />
Once the Incrementor ColdFusion component is cached, I then proceed to call the Increment() method three times. As expected, our log file reads:
1.0
2.0
3.0
Each subsequent call incremented the shared index correctly.
That was all executed as expected. Now, let's introduce ColdFusion 8's new CFThread tag to the equation and see what happens:
<!---
Create an instance of the incrementor and cached it
in the APPLICATION scope. We are doing this only to make
sure that the "request" is not having some weird time-out
issue with the CFThread.
--->
<cfset APPLICATION.Incrementor = CreateObject(
"component",
"Incrementor"
).Init()
/>
<!---
Call the Increment() method three times. Even though we are
using CFThread here to run the calls in parallel, the CFLock
within the Increment() method should synchronize them.
--->
<cfthread
name="T1"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<cfthread
name="T2"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<cfthread
name="T3"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<!--- Make sure all the threads join back in. --->
<cfthread action="join" />
Here, I am running the three method calls to Increment() from within individual CFThread tags. To make sure that there is no confusion over object references, I am explicitly passing the cached instance of the Incrementor ColdFusion component into each CFThread tag context via its ATTRIBUTES. Even through CFThread runs code in parallel, remember that the named, exclusive CFLock I used inside the Incrementor() method should single thread these calls.
We should get the same exact output to our log file; but, when we run this code, our log file reads this:
1.0
1.0
1.0
Crazy right?
My first thought was that maybe there was just something going wrong in the parallel nature of the ColdFusion CFThread tags. So, to make sure that wasn't the problem, I modified the above example to have the requesting page wait for each thread to return before firing off the next one:
<!---
Create an instance of the incrementor and cached it
in the APPLICATION scope. We are doing this only to make
sure that the "request" is not having some weird time-out
issue with the CFThread.
--->
<cfset APPLICATION.Incrementor = CreateObject(
"component",
"Incrementor"
).Init()
/>
<!---
Call the Increment() method three times. Even though we are
using CFThread here to run the calls in parallel, the CFLock
within the Increment() method should synchronize them.
--->
<cfthread
name="T1"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<!--- Wait for thread to join. --->
<cfthread action="join" name="T1" />
<cfthread
name="T2"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<!--- Wait for thread to join. --->
<cfthread action="join" name="T2" />
<cfthread
name="T3"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
<!--- Wait for thread to join. --->
<cfthread action="join" name="T3" />
Here, the parallel nature of the CFThread tag doesn't even matter because we are purposefully single-threading all of our threads. But, even so, when we run the above example, our log file reads:
1.0
1.0
1.0
Bananas!!!
I have no idea what is going on here. I have picked through every inch of this code and of my knowledge of CFThread and I cannot for the life of me find any issues with the code. These examples are not complex - I wouldn't even know how to make them any more simple. My only conclusion is that this is some kind of crazy bug in ColdFusion's CFThread tag. I can't even quite describe what is going on.
Want to use code from this post? Check out the license.
Reader Comments
not sure if this is related, but according to the cfthread docs, attributes are deep-copied. I wonder if what you think is your single cached instance of that component is in fact being serialized and thus copied into a new version, which is why it's always starting out at 1.
@Marc,
Hmm, let me alter that a bit.
You know more about the internals then I do and I could be way off base here, but I wonder if the issue is related to the fact that CF uses duplicate() behind the scenes on your variables that you pass into cfthread.
Try calling increment a few times before you kick off the threads and see what happens. If you call it 3 times prior to the threads does it output 3 each time?
and what marc said... :)
@Marc,
Hmmm. If I stop passing in the values of the CFC via the CFThread attributes and instead call:
<cfset APPLICATION.Incrementor.Increment() />
... directly from within the CFThread, then it works properly.
Does a deep copy *really* duplicate CFCs? I guess the next thing to test is to see if a Duplicate(), which is also a deep copy, operates in the same way.
Something about this doesn't seem right.
@Marc,
HOLY COW! When I run this demo:
<!--- Duplicate the app scope twice (deep copy). --->
<cfset objScopeA = Duplicate( APPLICATION ) />
<cfset objScopeB = Duplicate( APPLICATION ) />
<!--- Increment each scope's incrementor. --->
<cfset objScopeA.Incrementor.Increment() />
<cfset objScopeB.Incrementor.Increment() />
I get the following log:
1.0
1.0
Deep copy in ColdFusion duplicates CFCs as well? Really? Really? How did I not know that? Really?
does a deep copy really duplicate CFCs? My guess is yes. This should be pretty easy to test though. stick an instance var and assign it createUUID() in the constructor and then inside the thread, fetch the value so that you can see it. my suspicion is that when you pass in your instance into the cfthread, you're gonna see a different uuid value every time. if not... that'd be weird.
but it's not fool-proof. i was curious about this a few weeks ago, and brian rinaldi pointed me to a post on his blog about the problems he encountered with serializing CFCs
As others have said, I think it's definitely a duplicated CFC. I did for testing:
<cfthread
name="T1"
action="run"
incrementor="#APPLICATION.Incrementor#">
<cfset ATTRIBUTES.Incrementor.Increment() />
<cfset thread.mycopy = attributes.incrementor>
</cfthread>
When running t2, I then did:
<cfthread
name="T2"
action="run"
incrementor="#cfthread.t1.mycopy#">
<cfset ATTRIBUTES.Incrementor.Increment() />
</cfthread>
and the values came out correctly.
Here is a dumb question though. You have an application scoped CFC as a service. Why not just use in the thread? You have locks in place already. I wouldn't bother passing it to the thread at at all.
@Marc, @Todd,
Thanks for the tip. I really had no idea that you could duplicate a CFC. It still feels really wrong :) But, I'll talk to my shrink about it.
@Ray,
From what Rupesh has told me at conferences about CFThread, it is that it is, behind the scenes, actually a CFFunction call. It seems, that as a best practice for CFFunction calls in general, we should pass in all the required values:
function Run( Incrementor){
. . . . arguments.Incrementor.Increment()
}
Maybe that's a bit of stretch. But, also due to the parallel and unpredictable run-time of a thread (maybe it executes now, maybe in 10 minutes), I always just felt it was safer to explicitly pass in all values rather than assume that those "global" values would exist at the time the thread is run.
Now, obviously, this is an APPLICATION-scoped value, so its existence should never be in question. But, what about a VARIABLES-scoped or a REQUEST-scopes value? Is that guaranteed to be there when the thread executes?
My neurotic self says, NO! Therefore, I sort of developed a habit of explicitly passing all required values through the attributes.
Maybe it's time to re-evaluate.
Following up on Marc's comment.
I am curious about why you're using the attribute scope instead of using the application scope. The incrementor is already in a scope available to the threads.
I totally did not refresh the page before I posted, thus missing all the other responses. >>
@Bradley,
Maybe this is crazy of me, but referring to the APPLICATION scope inside a CFThread tag feels akin to referring to the APPLICATION scope from within a CFC. My gut tells me that it breaks scope encapsulation, especially when you consider that it doesn't immediately execute.
I think the rule of not breaking scope is just that - a rule,a guideline, not a Law as in you shall always follow it 100% of the time.
Remote APIs are a perfect example of this. Imagine you build api.cfc for your web site. This is a remote-enabled CFC that lets folks get crap. Why would you NOT then use App scoped CFCs within that CFC in order to return data? It would be silly not to.
(Blog post on this coming up in an hour or so.)
some more good reading:
http://corfield.org/blog/index.cfm/do/blog.entry/entry/duplicate_is_bad_for_your_objects_health
@Ray,
It's funny you mention that. Remote API are a great example. And, in fact, in my KinkyTwits, this holds true as well. My remote API CFC must refer to the APPLICATION.Factory as they cannot be "initialized" properly during a remote invocation.
I guess I am still growing my feelings on this. My use of CFThread is fairly limited, so my thoughts have not had time to mature.
@Todd,
Ahh, I remember seeing that way back when. I think I totally ignored it because I thought the idea of duplicating a CFC was such a silly thing. Looks like not know about it has come back to bite me.
Spooky... I'm investigating a cfthread issue at the moment and this article had clearly been indexed by Google before it appeared in my RSS reader!
Ben, I think your hesitance to use the shared application variable is because of a bias introduced by working in CF, which I think we all share. In other languages where threading is much more natural (and much more common), using shared variables is an extremely common practice for state maintenance. Many languages that are better at threading have constucts built in that are meant specifically to ensure that things stay in the proper state when using a shared variable or object.
I think if you poke around a few wikipedia articles on shared state maintenance, you'll get much more comfortable with the concept of threads accessing a shared object. Take a look at the wiki article on semaphores to start. Also, if you're at all comfortable with .NET, this site has an absolutely fantastic introduction to multithreaded programming in .NET that may give you a different point of view on the topic: http://www.yoda.arachsys.com/csharp/threads/
@Roland,
Perhaps part of this also stems from the fact that web-based applications have very different request life cycles than desktop-stlye software. With a web page, you make a request, it runs, and ends. There is a sense of start and end with a page request which I think is absent for a desktop application whose only delimiters are start up and shut down.
But of course, the fear I have might just be the fear of the unknown. The fear of not deeply considering the task at hand. If I stop and really think about all the "worst case" scenarios, I am sure that you are correct - I will see that threads act fine in this way.
Ben,
I think http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=Tags_t_04.html
explains the problem you are seeing. It is in the section: The Attributes scope and thread attributes
John
Sorry, bad link. that is what i get for not testing it first. This one should be right.
http://livedocs.adobe.com/coldfusion/8/htmldocs/help.html?content=threads_4.html#1153708
@John,
Thanks for the link. As taken from the documentation:
ColdFusion makes a complete (deep) copy of all the attribute variables before passing them to the thread; therefore, the values of the variables inside the thread are independent of the values of any corresponding variables in other threads, including the page thread. For example, if you pass a CFC instance as an attribute to a thread, the thread gets a complete new copy of the CFC, including the contents of its This scope at the time that you create the thread. Any changes made to the original CFC outside the thread, for example, by calling a CFC function, have no effect on the copy that is in the thread. Similarly, any changes to the CFC instance in the thread have no effect on the original CFC instance.
Ok, looks like they spell it out quite explicitly. My bad :)
I've been working recently with threads but never thought to pass in my application scoped/cached CFC, I automatically figured that the thread would have access to the Application Scope.
I usually would 100% agree with you that everything that a function needs should be passed to it, but for some reason the thought to pass the cfc to the thread just seemed unnatural.
I'm kind of glad you ran into this issue because I never gave much thought about the deep copying before. Now if I run into this problem I will at least know why. Thanks for sharing!
@Gary,
Glad this post could help in some way. I feel a bit silly that my "problem" was clearly documented in the LiveDocs. But, I suppose that now I know more than ever about ColdFusion... and that's NEVER a bad thing.
Glad someone posted my rant about duplicate() and CFCs (thanx Todd).
Yes, cfthread is a pain in the ass because of the deep copying of attributes. It made it really painful for me to implement an encapsulated threading model in Edmund (my event-driven framework). I solved it by creating a thread execution context CFC. You create the thread helper, and initialize it with *references* to your variables and then call a method on that CFC to trigger cfthread, which then refers directly to the variables scope of the helper.
I suspect it's still not 100% safe due to contexts going out of scope but I really was at a loss to create a totally bullet-proof way to do this... I'm still working on it (although it hasn't failed for me yet in production load scenarios).
@Sean,
I really like the idea of the "thread context". I think you had mentioned one time that you do something similar for views in MVC (a view context), which to me sounds like a very similar idea.
Yes, the thread context CFC - like the view context CFC - is a lightweight CFC that you create (on each request) and initialize with just the data needed to run the thread (or view). The CFC then runs the cfthread tag (or includes the view file) so that you have thread safety as well as a variables scope that can pass through from initializer to thread (include file).
Maybe the same issue here, but slightly different? I've found that if I create cfthreads in a loop, then join them and do a cfdump var="#cfthread#" the output is correct. But if I try to explicitly reference the cfthread struct (cfthread['t1'].output) it returns the same value each time.
Example:
<cfset lst = "t1, t2, t3">
<cfloop list="#lst#" index="idx">
<cfthread name="#idx#" action="run">
<cfoutput>this is from #idx#</cfoutput>
</cfthread>
</cfloop>
<cfthread name="#lst#" action="join"/>
<cfdump var="#cfthread#">
<cfloop collection="#cfthread#" item="itm">
<cfoutput>#cfthread[itm].output#</cfoutput><br />
</cfloop>
The cfdump outputs the struct properly, but the collection loop doesn't. It "normally" outputs:
this is from t3
this is from t3
this is from t3
but if I re-run the page several times I'll end up with something like:
this is from t3
this is from t2
this is from t3
Now, if I change my example to:
<cfset lst = "t1, t2, t3">
<cfthread name="t1" action="run">
this is from t1
</cfthread>
<cfthread name="t2" action="run">
this is from t2
</cfthread>
<cfthread name="t3" action="run">
this is from t3
</cfthread>
<cfthread name="#lst#" action="join"/>
<cfdump var="#cfthread#">
<cfloop collection="#cfthread#" item="itm">
<cfoutput>#cfthread[itm].output#</cfoutput><br />
</cfloop>
The cfdump and the collection loop output are both correct.
Any ideas?
@Abram,
The issue you are seeing is a side-effect of the asynchronous processing of threads. By the time any one of your threads executes, the #idx# variable might be completely exhausted (at the end of the list).
To get around this, and as a best practice, you should really pass variables into the cfthread tag:
<cfthread name="#idx#" index="#idx#" action="run">
. . . . <cfoutput>this is from #ATTRIBUTES.Index#</cfoutput>
</cfthread>
Notice that I am passing "idx" in as the "index" attribute and then referencing that. In doing so, you will make sure that it is always the appropriate value.
Ahh.... makes sense now. Thanks!
I ran into the same problem but what ever was inside the cfthread was cache'd until i cleared both template and class cache in administrator. but what a pain. (my example i am sending an email cfmail and using cfthread to fire the email and forget it.)