Skip to main content
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Samer Sadek
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: Samer Sadek ( @samer_sadek )

Considering MySQL, CFQueryParam, Prepared Statements, And SQL Queries That Use IN Clauses In ColdFusion

By on
Tags: ,

I've been using ColdFusion's CFQueryParam for longer than I can remember. It seems like it's just always been there, protecting us against SQL-Injection attacks and helping to speed up query parsing and performance. But, I have to admit that my mental model for CFQueryParam has always been a little bit fuzzy, especially when it comes to the list attribute. Over the weekend, however, I was doing a lot of thinking about a particularly SQL query that I wrote; and, about its sub-par performance. I wanted to take a moment to think more deeply about prepared statements and open-ended IN clauses in MySQL and ColdFusion.

To set the stage for this post, consider that I have a SQL query that gathers records based on an open-ended list of ID values. Something like this:

SELECT
	u.id,
	u.name,
	u.email
FROM
	user u
WHERE
	u.id IN ( ?????? )
;

The list of IDs using in the IN () clause is generated based on the results of a previous SQL query. A such, the list of IDs may contain a handful of values; it may contain hundreds of values; it's completely open-ended. Currently, this list of IDs is passed-through as raw SQL. Meaning, the parent CFQuery tag is not producing a prepared statement - it does not use CFQueryParam.

To quote Adobe's article on the Hidden Power of CFQueryParam, using dynamic queries without prepared statements is problematic:

When the variable in our query is rendered by ColdFusion and the query is processed, SQL Server sees each version of the query as a completely different query and therefore has to create a new Execution Plan for it. Having an Execution Plan generated for every version of this query is expensive, in both processing overhead and resource utilization. If you run that query 1,000 times per day with a different ID each time, you have 1,000 copies of that same execution plan sitting in RAM on your SQL Server. Plus, the processing overhead of actually generating an Execution Plan is expensive, that's why SQL Server wants to cache it so that it doesn't have to do it every time. Repeatedly generating an Execution Plan can lead to higher CPU usage, and slower queries.

As such, in order to speed this SQL query up, I was considering converting the list of IDs to a list-based CFQueryParam tag. But, I know that some databases have a limit to the number of parameters that can be used in a single prepared statement. As such, I wanted to double-check my understanding of these limitations.

As it turns out, my understanding of prepared statements is very old. When I started my career, I was working on systems that used Microsoft SQL Server (MSSQL). I remember running into limits of around 2,000 parameters in MSSQL Server; and, I just assumed at the time that all database had the same kind of limit.

Over the weekend, however, as I was reading up on prepared statements, I came across this StackOverflow post that outlined the maximum number of parameters per database vendor. And, I was shocked to learn that MySQL - my current database technology of choice - has no maximum number of parameters (by default).

Given this new insight about prepared statements in MySQL, it means that I should be able to safely update the IN clause to use a CFQueryParam tag. However, it's not quite that simple. And, as I was reading various articles on the matter, I came across another StackOverflow post about using prepared statements with IN clauses that really gave me pause.

Because ColdFusion's CFQueryParam tag can accept a list attribute, it's easy to fall into the trap of thinking that the following tag represents a single bound parameter:

<cfqueryparam value="#idList#" sqltype="integer" list="true" />

Under the hood, however, ColdFusion is generating a unique parameter for every value in the list. In fact, if we look at the Lucee CFML implementation on GitHub, we can see this taking place:

// ..... truncated ......
if (list || (Decision.isArray(value) && ARRAY_TYPES.contains(item.getType()))) {

	Array arr;

	if (Decision.isArray(value)) {
		arr = Caster.toArray(value);
	}
	else {
		String v = Caster.toString(value);
		if (StringUtil.isEmpty(v)) {
			arr = new ArrayImpl();
			arr.append("");
		}
		else arr = ListUtil.listToArrayRemoveEmpty(v, separator);
	}

	int len = arr.size();
	StringBuffer sb = new StringBuffer();
	for (int i = 1; i <= len; i++) {
		query.setParam(item.clone(check(arr.getE(i), item.getType())));
		if (i > 1) sb.append(',');
		sb.append('?');
	}
	write(sb.toString());
}
// ..... truncated ......

Notice that in the Java code, Lucee CFML is looping over the list of values passed to the CFQueryParam tag and is, for each value, calling:

query.setParam( ... )

It's binding a unique parameter for each list-item value.

What this means is that when we use the list attribute with CFQueryParam, we are creating a different prepared statement for every unique length of the list. So, if I execute a SQL query with a CFQueryParam that comes through with the following five lists:

  • <cfqueryparam value="1" list="true" />
  • <cfqueryparam value="1,2" list="true" />
  • <cfqueryparam value="1,2,3" list="true" />
  • <cfqueryparam value="1,2,3,4" list="true" />
  • <cfqueryparam value="1,2,3,4,5" list="true" />

... ColdFusion is creating five unique prepared statements, one for each list length.

In the aforementioned StackOverflow post, one suggestion to get around this limitation of using IN clauses within prepared statements is to augment the list of values such that it gets bucketed into a smaller set of possible combinations. For example, if we reworked the above list of CFQueryParam values to be this:

  • <cfqueryparam value="1,1,1,1,1" list="true" />
  • <cfqueryparam value="1,2,2,2,2" list="true" />
  • <cfqueryparam value="1,2,3,3,3" list="true" />
  • <cfqueryparam value="1,2,3,4,4" list="true" />
  • <cfqueryparam value="1,2,3,4,5" list="true" />

... we'd end-up generating a single preprepared statement (with five bound parameters) while still querying for the same unique ID lists.

This kind of approach would reduce the number of prepared statements that the Client and Server have to store; and it would greatly increase the chances of being able to actually reuse the prepared statement since list-values of various lengths will all end up getting chunked into the same bucket.

To explore this approach in ColdFusion, I put together a simple Lucee CFML page that queries for a list of users based on a given ID list. The ID list is normalized to the closest multiple of an arbitrary block-size:

<cfscript>
	
	// Get the list of users for the given ID-list.
	dump( getUsers( "1,2,3,4" ) );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I get the users for the given IDs.
	* 
	* @idList I am the list of ids to fetch.
	*/
	public query function getUsers( required string idList ) {

		var users = queryExecute(
			sql = "
				SELECT
					u.id,
					u.name,
					u.email
				FROM
					user u
				WHERE
					u.id IN ( :idList )
				;
			",
			// For performance reasons, we want to the MySQL server to cache the parsing
			// and execution of this query. As such, we want to create a prepared query
			// statement. However, since the list of IDs may be open-ended, we don't want
			// to create a new prepared statement for every unique LENGTH of IDs. To
			// limit the number of prepared statements that can be generated, we are
			// going to "pad" the list of IDs such that the number of Query Parameters is
			// a predictable size.
			params = {
				idList: {
					value: normalizeIdBlockSize( idList ),
					sqlType: "integer",
					list: true
				}
			}
		);

		return( users );

	}


	/**
	* I ensure that the given ID list is normalized to be a multiple of the given
	* blocksize. This allows us to prepare SQL queries with a finite set of parameters.
	* 
	* NOTE: I expect the ID list to be non-empty.
	* 
	* @idList I am the ID list being normalized.
	* @blockSize I am the multiple to which the ID list should be clamped.
	*/
	public string function normalizeIdBlockSize(
		required string idList,
		numeric blockSize = 10
		) {

		var remainder = ( idList.listLen() % blockSize );

		// If the list of IDs is already evenly divided into the blocksize, then we can 
		// use the idList as is.
		if ( ! remainder ) {

			return( idList );

		}

		// If the list of IDs does NOT divide evenly into the blocksize, we have to "pad"
		// the list of IDs, repeating the last value enough times to fill up the block.
		var padding = idList.listLast();

		return( idList & ",#padding#".repeatString( ( blockSize - remainder ) ) );

	}

</cfscript>

As you can see, the list of IDs being bound within the getUsers() query is first getting normalized into a list that has a length that is a multiple of 10. To be clear, 10 is not some magic length - it's just the length that I happened to choose for the demo. At this time, I don't have an instinct for what bucket sizes would make sense.

Now, if we run this ColdFusion code, we get the following output:

A CFQuery being executed with a normalized set of CFQueryParam list values in Lucee CFML.

As you can see from the CFDump meta-data, our initial list of IDs:

1,2,3,4

... is ultimately bound to the prepared SQL statement as:

1,2,3,4,4,4,4,4,4,4

The last list value - 4 - has been used to pad the list of IDs such that it ends up being 10-items. Now, every combination of user IDs that is passed to this query will get lumped into a more predictable set of prepared statements.

Again, to be crystal clear, I have never tried this in production. This is an approach to prepared statements that I literally learned about yesterday. That said, I fully intend to experiment with this approach using LaunchDarkly and FusionReactor such that I can slowly roll it out and see if it makes any significant difference in the performance graphs.

Time To Revisit Existing Uses of CFQueryParam and list

I know for a fact that my code has existing instances of the CFQueryParam tag that use the list attribute. Given this new perspective and clarity on prepared statements, I definitely need to go back and look at how these queries are potentially exploding the number of prepared statements that are being generated.

Want to use code from this post? Check out the license.

Reader Comments

14 Comments

I think the first thing that would come to mind with a huge in clause would be to check if there was some data already in the database that would help such as "where foo_id in (select foo_id from lookup_table where category = 'something') .

I'd also consider the use of # temp or ## global temp tables. You can do some pretty amazing things with CTE's and temp tables.

15,674 Comments

@Christopher,

Part of where I want to use this is inside a recent refactoring. I took a massive SQL query, in which there were a lot of LEFT OUTER JOIN tables and sub-selects and tried to break those things out into their own smaller, more easily-understood queries. So, essentially, I am taking the Joins and sub-selects from the query and performing those in the app logic now.

I'm trying to find a good balance of performance and readability. I'm still tinkering (hence this post).

That said, I don't know very much about using temporary tables. I'll poke at that a bit more and see what I can find.

3 Comments

Interesting. Haven't used CF in a long time, but just knowledge on prepared statements. Definitely means want to evaluate execution plans when there are variable lengths.

15,674 Comments

@All,

So, I've continued to noodle on this topic and just recently came to the realization that I've fundamentally misunderstood how the MySQL JDBC driver manages, cached, and executes prepared statements:

www.bennadel.com/blog/3844-ive-had-fundamental-misunderstandings-of-how-cfqueryparam-works-within-mysql-queries-and-coldfusion-lucee-cfml.htm

I believe this is specific to MySQL; and, is clearly driven by our JDBC / J-Connector settings; but, the take-away is that my reality has been very far-off from my perception of how prepared statements are working; and, how we are using them in our ColdFusion / Lucee / CFML applications.

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel