Ask Ben: Processing Files With CFThread In ColdFusion
Posted March 30, 2010 at 9:56 AM by Ben Nadel
Ben, I'm not quite sure about how threads work and when to use them. Here's a for instance: The user has uploaded a bunch of files that I have to do some processing on. It may take a while. I'd like to send them back a notice that the process will take a while and I'll send them an email when it's done. In the meantime, I want the files to get processed. Do I need to use a thread to do the file processing so that I can send the user back a message right away? I asked some of the guys I program with and they didn't really know either. I figured if none of us knew when threads should be used, that would be perfect for an Ask Ben!
In a word, Yes. Anything that you don't need to wait on to finish processing can be placed inside of a CFThread tag. The CFThread tag in ColdFusion spawns additional threads that will be executed asynchronously to the primary page. Once these asynchronous threads have been spawned, you can either wait for them to finish and rejoin them (CFThread[action=join]), or, what seems to be more appropriate for your situation, forget about them and let them finish at their own leasure. This latter approach allows you to provide much more immediate feedback to your users.
| || || || || |
| || |
| || || |
To me, the use of ColdFusion's CFThread tag is more about a quicker response time to the user than it is about a quicker processing time. Yes, you can leverage parallel threads to expedite some tasks, such as executing parallel CFHTTP requests. But, there's nothing inherently fast about ColdFusion's threading system. When you spawn a new thread using CFThread, it might not execute immediately. If you are on ColdFusion Standard, you can only have a maximum of 2 CFThreads executing at a given time. As such, especially if you are shared hosting, your CFThread function might get queued behind any number of already-queued CFThread entries.
In the end, thanks to thread queuing, using ColdFusion's CFThread tag might actually increase the total amount of time required for a given algorithm to complete. That's why I say that CFThread is more about a quicker response time and less about a quicker processing time. By using CFThread, we can decrease the amount of information that has to be processed synchronously; this allows us to provide feedback to the user in a shorter period of time even if the overall algorithm requires more time to complete.
To explore this concept, I've created a photo upload form in which the user's uploaded photo must to be resized to 50% and 10% of its original dimensions. The upload itself is very fast; it's the resizing of the photo that takes a significant amount of time. Assuming that the user doesn't need to wait for this photo resizing to take place, we can execute the resize asynchronously inside of a CFThread tag.
- <!--- Param the FORM variables. --->
- <cfparam name="form.email" type="string" default="" />
- <cfparam name="form.photo" type="string" default="" />
- <!--- Define the photo upload directory. --->
- <cfset photosDirectory = (
- getDirectoryFromPath( getCurrentTemplatePath() ) &
- ) />
- Check to see if the form fields have all been submitted;
- only then will we process the form.
- <cfif (
- len( form.email ) &&
- len( form.photo )
- Store the uploaded temp photo into the destination
- directory (the actual upload happens with the form
- POST - we just need to move the temp file into an
- actual server-side directory).
- Now that we have moved the file, we can begin to process
- the file in parallel (letting the user know via email when
- this is done).
- NOTE: All variables passed via an attribute are COPIED BY
- VALUE; this includes structs and CFCs which will be passed
- by deep copy into the thread.
- We now want to take the photo and break it down into 2
- different sizes (50% and 10%).
- <!--- Build up the file name to be use for the 50% file. --->
- <cfset filePath50 = (
- photosDirectory &
- upload.serverFileName &
- "_50." &
- ) />
- <!--- Build up the file name to be used for the 10% file. --->
- <cfset filePath10 = (
- photosDirectory &
- upload.serverFileName &
- "_10." &
- ) />
- <!--- Resize to 50%. --->
- <!--- Resize to 10%. --->
- Now that the photo has been processed, let's email
- the user to let them know that the files have been
- from="""PhotoUpload"" <firstname.lastname@example.org>"
- subject="Your Photo Has Been Processed"
- Your email has been processed. See attached.
- <!--- Attach both resized files. --->
- <cfmailparam file="#filePath50#" />
- <cfmailparam file="#filePath10#" />
- Now that the thread is processing in parallel, we can
- re-direct the user to the confirmation page.
- <!DOCTYPE HTML>
- <title>Processing Files With CFThread</title>
- Processing Files With CFThread
- <input type="text" name="email" size="40" />
- <input type="file" name="photo" size="40" />
- <input type="submit" value="Upload Photo" />
- Photo Used:
As you can see, the only processing that happens outside of the CFThread tag is the uploading of the photo itself. Because both resizing actions take place inside of the CFThread tag, the main page doesn't have to wait for those actions to complete. As such, once the photo is uploaded, the user is immediately redirected to the confirmation page while the photo resizing happens in the background.
While the new thread executes asynchronously to the main page, the code contained within the CFThread tag body executes synchronously. This allows us to place a CFMail tag at the end of the CFThread tag body in order to alert the user once the photo processing has been completed.
When it comes to the use of CFThread, perhaps the most complicated aspect of it is figuring out how to most appropriately pass data into the thread context. The CFThread body can pull data from two primary places: the main page's variables scope and the CFThread attribute collection. All data passed into the thread via the CFThread attributes are passed by deep-copy. This is true for all types of data including complex objects like structs and ColdFusion components. Figuring out which approach to use boils down to the dynamic nature of the data in question. If the data you are referring to won't change, using the main page's variables scope is probably a fine way to go. If, however, the data you are referring to will change, potentially before your queued thread executes, then you'll probably want to pass the data via deep-copy as a CFThread tag attribute. This way, your CFThread tag body gets a static copy of the data value regardless of what happens to that data value during the rest of the primary page processing.
ColdFusion's CFThread tag is very powerful when used properly. I hope that this demo has shed some light on the ins and outs of its usage.
How great that you posted this just today, as I was going to start on some new functionality for a webshop making use of cfthread. The outline is like so:
User has ordered a couple of items from the webshop, payment processing has gone great and now we're headed into to the confirmation page. But whilest showing this to the end user I need to do a couple of more things:
1. Create multiple PDF's with data from the shooppingbag. I group the results of the items in the shoppingbag per producer. Hence I might get three producers per shoppingbag content. Each producer needs to get his/her specific PDF with items out of the shoppingbag that he/she is the producer of.
2. After having created (or during in a cfloop) the PDF's, the PDF needs to be added as an attachment to an email that is being sent to the producers. So each producer gets a personal email with the items from the shoppingbag of which he/she is the producer as well as the PDF (with the invoice and Parcelslip) attached.
3. The PDF's need only to be created in memory, or if neccessary on disk, but deleted afterwards when mail has been sent (as I don't want these files to be present on the webserver).
Do I do all this in one CFTHREAD with a CFLOOP in it? Or multiple CFTHREADS?
I'm on Railo by the way, on my own server.
Thanx up front for helping me out with this ;-) I hope you have the time to do some of your magic to help me out, or at least point me in the right direction.
If all the PDF generation can happen separately than the main page, then I would say you can use a single CFThread tag with a loop in it. The only reason you might want to launch multiple CFThread tags is if performance is more of an issue - meaning, you need to generate the PDFs faster.
Of course, PDF creation, using CFDocument might have throttling all on its own. I am not sure how Railo works, but in standard CF, I think all documents on a single server get queues together (Enterprise opens this performance up). As such, even if you generated parallel CFThreads in CF standard, the CFDocument might cause a bit of a bottle neck. All to say, I think a single thread with a loop would be sufficient for your case.
As far as creating PDFs as email attachments, I actually just wrote about this recently as well. If Railo supports this, you can actually create PDFs without writing anything to disk:
Does that help at all?
Hi Ben, I was looking at your post. I just wanted to know one thing, the intent is to increase performance with parallel processing or anticipate a response to the client? If there is an error in processing the request after returning from "Sent successfully" for any reason. How to notify the client about it? May be useful for something like: "Processing request ... keep up with the status later." Congratulations for the post!
That's a really good question. Because the threads execute in parallel, the primary page is not "Aware" of any errors that happen in the threads. What you would have to do is either handle errors within each thread, or, Join the threads and check their status.
Let me see if I can play around with a blog post that demonstrates some of this stuff.
Take a look at this:
I hope that helps a bit.
In the example you link to (http://www.bennadel.com/blog/1700-Ask-Ben-Creating-A-PDF-And-Attaching-It-To-An-Email-Using-ColdFusion.htm) do you actually store the created PDF in memory or how do you go about it?
I don't understand it quite, is it equal to storing it in the ram:// ?
How does the thread "know" about the variable photosDirectory which you created outside of the CFTHREAD?
All the threads generated in the same context (request + parent scope) share the same "Variables" scope. As such, the CFThread tag body can read from the variables scope of the main page, which is where the photosDirectory lives.
You could have also passed the value into the CFThread if you wanted to:
... and then referred to it:
But just be aware that passing it via the attributes passes it by deep-copy. For string values, this point it moot; but for CFCs and structs, this can cause very unexpected results if you aren't aware that it is going on.
Also, as a note of caution, if you are trying to save something from the CFThread into the main page's variables scope, you *must* use the variables scope explicitly:
<cfset variable.foo = "Bar" />
If you don't use the "variables." prefix, you'll actually end up saving the "Bar" value in the thread local scope, which can be super confusing.
OK, I think I understand. It means that all application-scoped variables and all query-results need to be passed into the cfthread for it to be aware of them and be able to do something about it?
Am I correct in stating that?
I am not 100% sure on all the best practices on this. Part of me wants to think of CFThread as something that needs more encapsulation, sometimes I don't.
I think it comes down to two things:
1. I am updating any values INSIDE the CFThread.
2. Is there a chance the value will be changed outside the CFThread *before* my thread executed (in parallel).
If you are gonna be updating things outside of CFThread (1), then I would say do NOT pass them in via the attributes. Since Attributes will copy the values by VALUE, any update to said values will only be done locally to the CFThread and not to the original variable.
This is especially trick if you pass in a CFC and then call something like:
myCFC.setValue( x )
... This will only affect the thread-local copy of the CFC, not the original CFC (banged my head against a wall for a solid day on this one!!).
As far as (2), we have to remember that threads don't execute immediately - they are queued. And, they are not even guaranteed to execute in the *order* in which they were defined. So, if you need to reference a variable that might be changed by the main page OR by another thread by the time *this* thread uses it, then you should probably pass it in via attributes such that you receive a local copy.
As far as query records are concerned, that's a very interesting question! I don't know how that will resolve at the time the thread executes.
I'll give that test right now.
As I suspected, you need to pass the query column values into the CFThread as tag attributes otherwise the late-binding will mess you up:
Bloody good video Ben. Very informative.
P.S. Was that a printer in the background churning out pages ?