Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Andy Allan and Leanne Allan
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Andy Allan ( @fuzzyorange ) Leanne Allan ( @MrsFuzzyOrange )

Exploring The Possibility Of Parallelizing Queries In ColdFusion Using CFThread

By on
Tags:

I recently worked on a data-migration system built in Node.js. One of the really nice features of Node.js is that, by default, every asynchronous task has the ability to run in parallel. So, when it comes to data-migration, it's actually easier to run queries in parallel than it is to run them in serial. This can make the overall workflow incredibly fast. In ColdFusion, the CFQuery tag is a blocking operation. Theoretically, you could parallelize queries using the CFThread tag; but, the overhead of threading is probably too great. Or is it? I had absolutely no basis for that assumption; so, I wanted to try and run some sanity checks to see if my assumption held any water.

NOTE: I know that there is really just a single event-loop in Node.js; and, that everything in a single Node.js process runs on a single CPU. So, the idea of "parallelization" in Node.js is used very loosely. I know there's probably people who would love to argue that I'm actually talking about "concurrency" in Node.js, not "parallelization". But, to be frank, I'm just not smart enough to make such distinctions.

The test that I'm about to perform is in absolutely no way comprehensive. What I'm looking for here is not a proof-of-concept but rather, I'm looking to see if there are any immediately obvious failures-of-concept. I'm looking to see if using CFThread to run queries in parallel is blatantly slower or problematic. But, even if it doesn't appear to be slower on my local machine, again to be clear, that doesn't really prove anything.

The primary reasons that this test is not comprehensive:

  • My local development environment is not under any load.
  • There is no possibility of CFThread exhaustion due to competing requests.
  • The entire MySQL database likely fits in my machine's RAM.
  • ColdFusion can't spawn threads within threads, so this approach wouldn't be universally applicable (like it is in Node.js).

Ok, with the tiny-print out of the way, let's do some experimenting. First, to create a control case, I'm going to run a few queries that I might run in order to assemble a blog post (it's the largest data-set I have locally). And, to get a sense of trending, I'm going to do this in a loop, each time with a different primary-key value:

<!--- Run the timing test in a loop. --->
<cfloop index="i" from="1000" to="1050">

	<cfset startedAt = getTickCount() />

	<!--- -------------------------------------------------------------------------- --->
	<!--- -------------------------------------------------------------------------- --->

	<cfquery name="entry">
		SELECT
			e.*
		FROM
			blog_entry e
		WHERE
			e.id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
	</cfquery>


	<cfquery name="comments">
		SELECT
			c.*
		FROM
			blog_comment c
		WHERE
			c.blog_entry_id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
	</cfquery>


	<cfquery name="tags">
		SELECT
			t.*
		FROM
			blog_entry_tag_jn jn
		INNER JOIN
			tag t
		ON
			(
					jn.blog_entry_id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
				AND
					t.id = jn.tag_id
			)
		ORDER BY
			t.name ASC
	</cfquery>


	<cfquery name="related">
		SELECT
			e.id,
			e.name
		FROM
			blog_entry_blog_entry_jn jn
		INNER JOIN
			blog_entry e
		ON
			(
					(
							jn.blog_entry_id1 = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
						OR
							jn.blog_entry_id2 = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
					)
				AND
					(
							e.id = jn.blog_entry_id1
						OR
							e.id = jn.blog_entry_id2
					)
				AND
					e.id != <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
			)
		ORDER BY
			e.id ASC
	</cfquery>

	<!--- -------------------------------------------------------------------------- --->
	<!--- -------------------------------------------------------------------------- --->

	<cfoutput>
		Done running queries in serial in #numberFormat( getTickCount() - startedAt )# ms.
		<br />
	</cfoutput>

</cfloop>

As you can see, nothing special is going on here. I'm simply keeping track of how long, in milliseconds, each iteration takes. Then, I'm outputting the time at the end of each iteration. And, when we run the above code, we get the following output:

Done running queries in serial in 4 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 5 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 1 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 3 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 1 ms.
Done running queries in serial in 6 ms.
Done running queries in serial in 3 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 1 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 3 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 3 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 3 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 15 ms.
Done running queries in serial in 7 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 10 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 6 ms.
Done running queries in serial in 7 ms.
Done running queries in serial in 6 ms.
Done running queries in serial in 5 ms.
Done running queries in serial in 4 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 6 ms.
Done running queries in serial in 6 ms.
Done running queries in serial in 5 ms.
Done running queries in serial in 2 ms.
Done running queries in serial in 4 ms.

Running these few queries in serial is consistently very fast. When I refresh the page a few times, I do get the occasional outliers:

Done running queries in serial in 13 ms.
Done running queries in serial in 20 ms.
Done running queries in serial in 26 ms.
Done running queries in serial in 19 ms.
Done running queries in serial in 10 ms.
Done running queries in serial in 16 ms.
Done running queries in serial in 17 ms.
Done running queries in serial in 11 ms.

... but, in the vast majority of cases, each iteration runs in low single-digit millisecond time.

Now, let's take the same exact concept and try to run these queries in parallel inside of individual CFThread blocks. And, since each query is going to be wrapped in its own CFThread body, we need to pass-in the iteration primary-key and, when we join the threads back to the page, we need to pull the query results out of the thread scope:

<!--- Run the timing test in a loop. --->
<cfloop index="i" from="1000" to="1050">

	<cfset startedAt = getTickCount() />

	<!--- -------------------------------------------------------------------------- --->
	<!--- -------------------------------------------------------------------------- --->

	<!---
		Since every thread in a given request has to have a unique name, we have to
		append a UUID to each thread so that it can run in a loop.
	--->
	<cfset threadIteration = createUUID() />


	<cfthread name="entry-thread-#threadIteration#" i="#i#">
		<cfquery name="thread.entry">
			SELECT
				e.*
			FROM
				blog_entry e
			WHERE
				e.id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
		</cfquery>
	</cfthread>


	<cfthread name="comments-thread-#threadIteration#" i="#i#">
		<cfquery name="thread.comments">
			SELECT
				c.*
			FROM
				blog_comment c
			WHERE
				c.blog_entry_id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
		</cfquery>
	</cfthread>


	<cfthread name="tags-thread-#threadIteration#" i="#i#">
		<cfquery name="thread.tags">
			SELECT
				t.*
			FROM
				blog_entry_tag_jn jn
			INNER JOIN
				tag t
			ON
				(
						jn.blog_entry_id = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
					AND
						t.id = jn.tag_id
				)
			ORDER BY
				t.name ASC
		</cfquery>
	</cfthread>


	<cfthread name="related-thread-#threadIteration#" i="#i#">
		<cfquery name="thread.related">
			SELECT
				e.id,
				e.name
			FROM
				blog_entry_blog_entry_jn jn
			INNER JOIN
				blog_entry e
			ON
				(
						(
								jn.blog_entry_id1 = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
							OR
								jn.blog_entry_id2 = <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
						)
					AND
						(
								e.id = jn.blog_entry_id1
							OR
								e.id = jn.blog_entry_id2
						)
					AND
						e.id != <cfqueryparam value="#i#" cfsqltype="cf_sql_integer" />
				)
			ORDER BY
				e.id ASC
		</cfquery>
	</cfthread>


	<cfthread action="join" />

	<!--- Pull the query results out of the re-joined threads. --->
	<cfset entry = cfthread[ "entry-thread-#threadIteration#" ].entry />
	<cfset comments = cfthread[ "comments-thread-#threadIteration#" ].comments />
	<cfset tags = cfthread[ "tags-thread-#threadIteration#" ].tags />
	<cfset related = cfthread[ "related-thread-#threadIteration#" ].related />

	<!--- -------------------------------------------------------------------------- --->
	<!--- -------------------------------------------------------------------------- --->

	<cfoutput>
		Done running queries in parallel in #numberFormat( getTickCount() - startedAt )# ms.
		<br />
	</cfoutput>

</cfloop>

Since each CFThread needs to be uniquely named within the confines of a single web-request, I'm tacking a UUID onto the end of each thread name. This is only needed because we're looping over the thread-spawning. You wouldn't need to do this in a "standard" web request.

Now, my gut instinct - the assumptions that we're testing here - is that this approach will obviously be slower simply because of all the mechanics that go into spawning and running threads in ColdFusion. However, when we run the above code, we get the following output:

Done running queries in parallel in 7 ms.
Done running queries in parallel in 7 ms.
Done running queries in parallel in 5 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 5 ms.
Done running queries in parallel in 3 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 6 ms.
Done running queries in parallel in 8 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 7 ms.
Done running queries in parallel in 3 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 3 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 10 ms.
Done running queries in parallel in 8 ms.
Done running queries in parallel in 5 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 3 ms.
Done running queries in parallel in 3 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 6 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 1 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 6 ms.
Done running queries in parallel in 8 ms.
Done running queries in parallel in 13 ms.
Done running queries in parallel in 17 ms.
Done running queries in parallel in 5 ms.
Done running queries in parallel in 7 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 6 ms.
Done running queries in parallel in 6 ms.
Done running queries in parallel in 4 ms.
Done running queries in parallel in 2 ms.
Done running queries in parallel in 4 ms.

There are a few more outliers in the parallel version; and, the "serial" test looks to possibly trend one or two milliseconds faster; but, overall, this is way more performant than I expected it to be. Honestly, I was expecting a CFThread-based approach to be laughably slower. But, it looks like that assumption wasn't really based in reality.

So, what we see here is that the idea of parallelizing queries in ColdFusion using CFThread isn't immediately dead-in-the-water. However, that's not to say that this is truly a viable approach. After all, my local development environment is nothing like my production environment. No one is competing for thread resources. No one is competing for CPU resources. There's no reason for me to assume that local performance is any indicator of production performance.

The biggest fear that I still have in my head hinges on the fact that each production instance can only have 10 parallel CFThreads at any time (yay for Standard license limitations!). According to the documentation:

Standard Edition: Only the Async CFML, DirectoryWatcher, Socket, and user written event gateways are enabled. The maximum number of threads in the pool for the cfthread tag is limited to 10. Only the CFM page request limit is adjustable.

This number is much lower that the limit on concurrent page requests. As such, I'm much more likely to bump into a CFThread queuing issue before I run into a load problem on the application server. And that worries me way more than the performance of any individual CFThread block.

I'm actually a little angry that ColdFusion Standard makes the CFThread limit so low. Really Adobe? Really? This saddens me. "Performance" should never be a licensing concern. Only features should be.

To be honest, I am not sure where to go from here. On the one hand, I am curious to continue experimenting because running parallel queries was more performant than I had expected it to be. But, on the other hand, I have such FUD (Fear, Uncertainty, Doubt) about how likely I am to exhaust the CFThread pool. What a nail-biter!

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel