Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Considering MySQL, CFQueryParam, Prepared Statements, And SQL Queries That Use IN Clauses In ColdFusion

By Ben Nadel on
Tags: ColdFusion, SQL

I've been using ColdFusion's CFQueryParam for longer than I can remember. It seems like it's just always been there, protecting us against SQL-Injection attacks and helping to speed up query parsing and performance. But, I have to admit that my mental model for CFQueryParam has always been a little bit fuzzy, especially when it comes to the list attribute. Over the weekend, however, I was doing a lot of thinking about a particularly SQL query that I wrote; and, about its sub-par performance. I wanted to take a moment to think more deeply about prepared statements and open-ended IN clauses in MySQL and ColdFusion.

To set the stage for this post, consider that I have a SQL query that gathers records based on an open-ended list of ID values. Something like this:

SELECT
	u.id,
	u.name,
	u.email
FROM
	user u
WHERE
	u.id IN ( ?????? )
;

The list of IDs using in the IN () clause is generated based on the results of a previous SQL query. A such, the list of IDs may contain a handful of values; it may contain hundreds of values; it's completely open-ended. Currently, this list of IDs is passed-through as raw SQL. Meaning, the parent CFQuery tag is not producing a prepared statement - it does not use CFQueryParam.

To quote Adobe's article on the Hidden Power of CFQueryParam, using dynamic queries without prepared statements is problematic:

When the variable in our query is rendered by ColdFusion and the query is processed, SQL Server sees each version of the query as a completely different query and therefore has to create a new Execution Plan for it. Having an Execution Plan generated for every version of this query is expensive, in both processing overhead and resource utilization. If you run that query 1,000 times per day with a different ID each time, you have 1,000 copies of that same execution plan sitting in RAM on your SQL Server. Plus, the processing overhead of actually generating an Execution Plan is expensive, that's why SQL Server wants to cache it so that it doesn't have to do it every time. Repeatedly generating an Execution Plan can lead to higher CPU usage, and slower queries.

As such, in order to speed this SQL query up, I was considering converting the list of IDs to a list-based CFQueryParam tag. But, I know that some databases have a limit to the number of parameters that can be used in a single prepared statement. As such, I wanted to double-check my understanding of these limitations.

As it turns out, my understanding of prepared statements is very old. When I started my career, I was working on systems that used Microsoft SQL Server (MSSQL). I remember running into limits of around 2,000 parameters in MSSQL Server; and, I just assumed at the time that all database had the same kind of limit.

Over the weekend, however, as I was reading up on prepared statements, I came across this StackOverflow post that outlined the maximum number of parameters per database vendor. And, I was shocked to learn that MySQL - my current database technology of choice - has no maximum number of parameters (by default).

Given this new insight about prepared statements in MySQL, it means that I should be able to safely update the IN clause to use a CFQueryParam tag. However, it's not quite that simple. And, as I was reading various articles on the matter, I came across another StackOverflow post about using prepared statements with IN clauses that really gave me pause.

Because ColdFusion's CFQueryParam tag can accept a list attribute, it's easy to fall into the trap of thinking that the following tag represents a single bound parameter:

<cfqueryparam value="#idList#" sqltype="integer" list="true" />

Under the hood, however, ColdFusion is generating a unique parameter for every value in the list. In fact, if we look at the Lucee CFML implementation on GitHub, we can see this taking place:

// ..... truncated ......
if (list || (Decision.isArray(value) && ARRAY_TYPES.contains(item.getType()))) {

	Array arr;

	if (Decision.isArray(value)) {
		arr = Caster.toArray(value);
	}
	else {
		String v = Caster.toString(value);
		if (StringUtil.isEmpty(v)) {
			arr = new ArrayImpl();
			arr.append("");
		}
		else arr = ListUtil.listToArrayRemoveEmpty(v, separator);
	}

	int len = arr.size();
	StringBuffer sb = new StringBuffer();
	for (int i = 1; i <= len; i++) {
		query.setParam(item.clone(check(arr.getE(i), item.getType())));
		if (i > 1) sb.append(',');
		sb.append('?');
	}
	write(sb.toString());
}
// ..... truncated ......

Notice that in the Java code, Lucee CFML is looping over the list of values passed to the CFQueryParam tag and is, for each value, calling:

query.setParam( ... )

It's binding a unique parameter for each list-item value.

What this means is that when we use the list attribute with CFQueryParam, we are creating a different prepared statement for every unique length of the list. So, if I execute a SQL query with a CFQueryParam that comes through with the following five lists:

  • <cfqueryparam value="1" list="true" />
  • <cfqueryparam value="1,2" list="true" />
  • <cfqueryparam value="1,2,3" list="true" />
  • <cfqueryparam value="1,2,3,4" list="true" />
  • <cfqueryparam value="1,2,3,4,5" list="true" />

... ColdFusion is creating five unique prepared statements, one for each list length.

In the aforementioned StackOverflow post, one suggestion to get around this limitation of using IN clauses within prepared statements is to augment the list of values such that it gets bucketed into a smaller set of possible combinations. For example, if we reworked the above list of CFQueryParam values to be this:

  • <cfqueryparam value="1,1,1,1,1" list="true" />
  • <cfqueryparam value="1,2,2,2,2" list="true" />
  • <cfqueryparam value="1,2,3,3,3" list="true" />
  • <cfqueryparam value="1,2,3,4,4" list="true" />
  • <cfqueryparam value="1,2,3,4,5" list="true" />

... we'd end-up generating a single preprepared statement (with five bound parameters) while still querying for the same unique ID lists.

This kind of approach would reduce the number of prepared statements that the Client and Server have to store; and it would greatly increase the chances of being able to actually reuse the prepared statement since list-values of various lengths will all end up getting chunked into the same bucket.

To explore this approach in ColdFusion, I put together a simple Lucee CFML page that queries for a list of users based on a given ID list. The ID list is normalized to the closest multiple of an arbitrary block-size:

<cfscript>
	
	// Get the list of users for the given ID-list.
	dump( getUsers( "1,2,3,4" ) );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I get the users for the given IDs.
	* 
	* @idList I am the list of ids to fetch.
	*/
	public query function getUsers( required string idList ) {

		var users = queryExecute(
			sql = "
				SELECT
					u.id,
					u.name,
					u.email
				FROM
					user u
				WHERE
					u.id IN ( :idList )
				;
			",
			// For performance reasons, we want to the MySQL server to cache the parsing
			// and execution of this query. As such, we want to create a prepared query
			// statement. However, since the list of IDs may be open-ended, we don't want
			// to create a new prepared statement for every unique LENGTH of IDs. To
			// limit the number of prepared statements that can be generated, we are
			// going to "pad" the list of IDs such that the number of Query Parameters is
			// a predictable size.
			params = {
				idList: {
					value: normalizeIdBlockSize( idList ),
					sqlType: "integer",
					list: true
				}
			}
		);

		return( users );

	}


	/**
	* I ensure that the given ID list is normalized to be a multiple of the given
	* blocksize. This allows us to prepare SQL queries with a finite set of parameters.
	* 
	* NOTE: I expect the ID list to be non-empty.
	* 
	* @idList I am the ID list being normalized.
	* @blockSize I am the multiple to which the ID list should be clamped.
	*/
	public string function normalizeIdBlockSize(
		required string idList,
		numeric blockSize = 10
		) {

		var remainder = ( idList.listLen() % blockSize );

		// If the list of IDs is already evenly divided into the blocksize, then we can 
		// use the idList as is.
		if ( ! remainder ) {

			return( idList );

		}

		// If the list of IDs does NOT divide evenly into the blocksize, we have to "pad"
		// the list of IDs, repeating the last value enough times to fill up the block.
		var padding = idList.listLast();

		return( idList & ",#padding#".repeatString( ( blockSize - remainder ) ) );

	}

</cfscript>

As you can see, the list of IDs being bound within the getUsers() query is first getting normalized into a list that has a length that is a multiple of 10. To be clear, 10 is not some magic length - it's just the length that I happened to choose for the demo. At this time, I don't have an instinct for what bucket sizes would make sense.

Now, if we run this ColdFusion code, we get the following output:

A CFQuery being executed with a normalized set of CFQueryParam list values in Lucee CFML.

As you can see from the CFDump meta-data, our initial list of IDs:

1,2,3,4

... is ultimately bound to the prepared SQL statement as:

1,2,3,4,4,4,4,4,4,4

The last list value - 4 - has been used to pad the list of IDs such that it ends up being 10-items. Now, every combination of user IDs that is passed to this query will get lumped into a more predictable set of prepared statements.

Again, to be crystal clear, I have never tried this in production. This is an approach to prepared statements that I literally learned about yesterday. That said, I fully intend to experiment with this approach using LaunchDarkly and FusionReactor such that I can slowly roll it out and see if it makes any significant difference in the performance graphs.

Time To Revisit Existing Uses of CFQueryParam and list

I know for a fact that my code has existing instances of the CFQueryParam tag that use the list attribute. Given this new perspective and clarity on prepared statements, I definitely need to go back and look at how these queries are potentially exploding the number of prepared statements that are being generated.



Reader Comments

I think the first thing that would come to mind with a huge in clause would be to check if there was some data already in the database that would help such as "where foo_id in (select foo_id from lookup_table where category = 'something') .

I'd also consider the use of # temp or ## global temp tables. You can do some pretty amazing things with CTE's and temp tables.

Reply to this Comment

@Christopher,

Part of where I want to use this is inside a recent refactoring. I took a massive SQL query, in which there were a lot of LEFT OUTER JOIN tables and sub-selects and tried to break those things out into their own smaller, more easily-understood queries. So, essentially, I am taking the Joins and sub-selects from the query and performing those in the app logic now.

I'm trying to find a good balance of performance and readability. I'm still tinkering (hence this post).

That said, I don't know very much about using temporary tables. I'll poke at that a bit more and see what I can find.

Reply to this Comment

Interesting. Haven't used CF in a long time, but just knowledge on prepared statements. Definitely means want to evaluate execution plans when there are variable lengths.

Reply to this Comment

@Kevin,

I'm starting to experiment already. Hopefully I'll have some numbers to share by end-of-week.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.