Preventing Unbounded Full-Table Scans In My ColdFusion Database Access Layer

By Ben Nadel

Published 2022-11-01 in ColdFusion, SQL — Comments (2)

As I've continued to evolve my approach to building ColdFusion applications, one pattern that I've begun to embrace consistently in my data access layer / Data Access Object (DAO) is to block the developer from running a SQL query that performs a full-table scan. This is really only necessary in DAO methods that provide dynamic, parameterized SQL queries; but, it offers me a great deal of comfort. The pattern works by requiring each query to include at least one indexed column in the dynamically generated SQL statement.

ASIDE: A few years ago, I looked at using the SQL_SAFE_UPDATES setting in MySQL, which blocks the execution of UDPATE and DELETE queries without a WHERE clause. But, as much as possible, I like to have all my logic in the application code so that everyone can find it and understand it.

To explore this concept, let's look at the structure of a database table that I use at InVision - the semantic meaning of this table isn't important for this conversation, we just care about which columns are indexed. Or, more specifically, which columns can be used as an index prefix (ie, the leading column within a filtering clause):

CREATE TABLE `share_disclaimer_snapshot` (
	`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
	`prototypeID` int(10) unsigned NOT NULL,
	`disclaimerID` int(10) unsigned NOT NULL,
	`key` char(6) NOT NULL,
	`title` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci NOT NULL,
	`content` varchar(3000) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_520_ci NOT NULL,
	`createdAt` datetime NOT NULL,
	PRIMARY KEY (`id`),
	UNIQUE KEY `IX_byDisclaimer` (`disclaimerID`,`key`),
	KEY `IX_byPrototype` (`prototypeID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

As you can see, this table has three columns that can act as an index prefix:

id - the primary key.
prototypeID
disclaimerID

Now, in my data access layer, each dynamic query will enforce that one-or-more of these values is provided. Here's my getSnapshotsByFilter() method. Note that all of the arguments are optional but, we have an if-block that will throw() an error if certain arguments aren't provided.

component {

	/**
	* I get the share disclaimer snapshots that match the given filtering.
	*/
	public query function getSnapshotsByFilter(
		numeric id,
		numeric prototypeID,
		numeric disclaimerID,
		string key
		) {

		// One of the index-based constraints must be provided in order to prevent the
		// accidental scanning of the entire table.
		if (
			isNull( id ) &&
			isNull( prototypeID ) &&
			isNull( disclaimerID )
			) {

			throw(
				type = "ShareDisclaimerSnapshotGateway.ForbiddenSelect",
				message = "Open-ended select not allowed."
			);

		}

		```
		<cfquery name="local.results" result="local.metaResults">
			/* DEBUG: shareDisclaimerSnapshotGateway.getSnapshotsByFilter(). */
			SELECT
				s.id,
				s.prototypeID,
				s.disclaimerID,
				s.key,
				s.title,
				s.content,
				s.createdAt
			FROM
				share_disclaimer_snapshot s
			WHERE
				TRUE

			<cfif arguments.keyExists( "id" )>
				AND
					s.id = <cfqueryparam value="#id#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "prototypeID" )>
				AND
					s.prototypeID = <cfqueryparam value="#prototypeID#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "disclaimerID" )>
				AND
					s.disclaimerID = <cfqueryparam value="#disclaimerID#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "key" )>
				AND
					s.key = <cfqueryparam value="#key#" sqltype="varchar" />
			</cfif>

			ORDER BY
				s.id ASC
		</cfquery>
		```

		return( results );

	}

}

As you can see, despite all of the arguments being marked as optional, the code within the method ensures that at least one of the index-based arguments is being provided. The method accepts more than just indexed-based arguments (see key); but, it means that a developer can never accidentally invoke this method in a way that will execute a SQL statement without an argument that drastically filters-down the result-set using one of the indices.

This same pattern gets used in the delete queries as well. Here's my deleteSnapshotsByFilter() method in the same component:

component {

	/**
	* I delete the share disclaimer snapshots that match the given filtering.
	*/
	public void function deleteSnapshotsByFilter(
		numeric id,
		numeric prototypeID,
		numeric disclaimerID,
		string key
		) {

		// One of the index-based constraints must be provided in order to prevent the
		// accidental deletion of the entire table.
		if (
			isNull( id ) &&
			isNull( prototypeID ) &&
			isNull( disclaimerID )
			) {

			throw(
				type = "ShareDisclaimerSnapshotGateway.ForbiddenDelete",
				message = "Open-ended delete not allowed."
			);

		}

		```
		<cfquery name="local.results" result="local.metaResults">
			/* DEBUG: shareDisclaimerSnapshotGateway.deleteSnapshotsByFilter(). */
			DELETE FROM
				share_disclaimer_snapshot
			WHERE
				TRUE

			<cfif arguments.keyExists( "id" )>
				AND
					id = <cfqueryparam value="#id#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "prototypeID" )>
				AND
					prototypeID = <cfqueryparam value="#prototypeID#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "disclaimerID" )>
				AND
					disclaimerID = <cfqueryparam value="#disclaimerID#" sqltype="bigint" />
			</cfif>

			<cfif arguments.keyExists( "key" )>
				AND
					`key` = <cfqueryparam value="#key#" sqltype="varchar" />
			</cfif>
		</cfquery>
		```

	}

}

Again, we can see that all of the dynamic method arguments are optional; but, at least one of the index-prefix columns is required. And, just as with the get method, the delete method will throw() an error in order to prevent any accidental full-table scans (or deletes in this case!!).

This technique is less relevant for my UPDATE SQL statements since I usually only allow one update to be executed at a time based on the id (primary key) of the table. As such, my update statements are usually much less dynamic.

That said, if I do need to create a multi-record UPDATE statement, I usually create a special DAO method for a particular use-case (such as updating all records owned by a "parent" object). And, in that case, the SQL is always locked-down by an indexed column (such as the "foreign key" reference for the "parent" object).

None of this is technically necessary. In a perfect world, every developer would look at the database structure to see what makes sense (from a performance standpoint); and then, invoke data access methods in accordance with such information. But, I'm pretty sure many of us have deployed a SQL query that accidentally locks-up the database. And, this is just a baby-step towards helping to prevent that from happening in your next ColdFusion data access layer.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4348

Reader Comments

Will Belden Nov 7, 2022 at 2:37 PM

4 Comments

What would also be interesting, I guess in a pre-compile sort of way, is to go to the database (I'm more of an MSSQL guy, so INFORMATION_SCHEMA, etc.) and pre-determine which indexes exist. Maybe on a reinit operation, refresh those into local files, unsure. But have a list of those columns you can use rather than hard-coding. Just a thought from a quick scan over (been off machine for like 10 days, playing RSS catch-up.)

Ben Nadel Nov 7, 2022 at 2:56 PM

16,161 Comments

@Will,

That's a really clever idea. You can definitely read the existing indexes from the performance schema. I actually do something like that at work because I attach StatsD metrics to the primary-key space utilization in our database:

www.bennadel.com/blog/4165-inspecting-primary-and-secondary-index-key-utilization-for-mysql-5-7-32-in-lucee-cfml-5-3-7-47.htm

Doing something like that on init of the app would definitely be possible. I'll do some noodling on it for fun.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.