Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at the New York ColdFusion User Group (Nov. 2009) with: Ruslan Sivak
Ben Nadel at the New York ColdFusion User Group (Nov. 2009) with: Ruslan Sivak

SQL Queries That Look The Same Are Not Violating The DRY Principle

By Ben Nadel on
Tags: ColdFusion, SQL, Work

A couple of months ago, I tried to list out several lessons that I wish I had learned much earlier in my web development career. One of the more important lessons in that list is the proper application of the DRY principle; otherwise known as the, "Don't Repeat Yourself", principle. After writing that post, it occurred to me that it would be worth exploring SQL queries in the context of DRY. At first, it can feel like two SQL queries represent duplicate code if they contain the same query. But, this does not violate DRY; not unless the queries contain "business logic."

Many Michael Keaton's in Multiplicity movie: You know how when you make a copy of a copy it's not as sharp as... well... the original.

Imagine a SQL query in your application that looks like this:

SELECT
	id,
	name,
	email
FROM
	account
WHERE
	id = 1

Now, imagine a second query in the same application that looks like this:

SELECT
	id,
	name,
	email,
	passwordHash -- This field is the only difference.
FROM
	account
WHERE
	id = 1

To the newer developer, these two SQL queries are essentially the same. Which is why this scenario feels so much like a violation of the DRY principle. This is especially true when you are first learning about Design Patterns and are very eager to refactor everything you see into the clean abstractions that you just read about.

Trust me, I know how that feels. I've been there. I've used Design Patterns that I didn't fully understand. I've made my life much harder than it has to be.

The trap here is that the developer inappropriately applies the DRY principle to "syntax". Meaning, the developer sees the same (or similar) series of tokens, references, and operators and they believe that this represents "duplication". But, the DRY principle applies to the duplication of business logic, not the duplication of "code".

These queries have no inherent business logic. They are just data-access.

If we zoom-out from the queries a little bit, the picture starts to come into focus. Let's step up one level in the data-access context:

<!--- Our first query. --->
<cfquery name="local.results" cachedwithin="#one_hour#" datasource="#read_replica#">
	SELECT
		id,
		name,
		email
	FROM
		account
	WHERE
		id = 1
</cfquery>

<!--- Our second query. --->
<cfquery name="local.results">
	SELECT
		id,
		name,
		email,
		passwordHash -- This field is the only difference.
	FROM
		account
	WHERE
		id = 1
</cfquery>

Now that we see a bit more of the context, albeit a heavily truncated view, we can see that these two queries are executing under very different constraints. The first query is running against a read replica and is also using some low-level ColdFusion caching. The second query, on the other hand, is running against the master database.

What we begin to understand is that these two queries change for different reasons, which is a great indication that "duplication" is not taking place. The first query is clearly being used in a context in which replica lag and stale data is an "acceptable" trade-off. The second query, on the other hand, is being run in a context where only immediately-consistent data is acceptable for consumption.

Now, to be clear, I am not saying that you should be continually duplicating query logic all over your application. If you have query logic that can be wrapped-up and reused, by all means do it:

<cffunction name="getByFilter" returntype="query">
	
	<cfargument name="id" type="numeric" required="false" />

	<cfquery name="local.results">
		SELECT
			id,
			name,
			email,
			passwordHash
		FROM
			account
		WHERE
			1 = 1

		<cfif structKeyExists( arguments, "id" )>

			AND
				id = <cfqueryparam value="#id#" cfsqltype="cf_sql_integer" />

		</cfif>
	</cfquery>

	<cfreturn results />

</cffunction>

<!--- .... then, elsewhere in your application .... --->

<cfscript>
	
	var account1 = someAbstraction.getByFilter( id = 1 );
	var account2 = someAbstraction.getByFilter( id = 2 );
	var account3 = someAbstraction.getByFilter( id = 3 );

</cfscript>

Here, we've created an abstraction that allows us to simplify some of our calling code. But, this is not an exercise in "DRYing the code out"; nor does this fundamentally change the situation - it has only changed the "syntax" of the data-access. No "logic" has been encapsulated - we've only created a "short-hand" for a query that lends itself to re-use.

You can think of this change more along the lines of the "Extract Method" refactoring pattern, not an exercise in removing duplication. It is an implementation detail only; granted, one that happens to reduce the total lines-of-code (LOC) and will likely lead to greater developer velocity.

The DRY principle is about "business logic". If a SQL query contains business logic, it should not be duplicated. Take, for example, the following deletion query:

INSERT IGNORE INTO 
	account_archive
SELECT
	id,
	name,
	email,
	( '[sanitized]' ) AS passwordHash
FROM
	account
WHERE
	id = 1
;

DELETE FROM
	account
WHERE
	id = 1
;

This query is not just data-access. This query contains business logic that defines how two concepts - "account" and "account archive" - have to change in lock-step. It also defines the very act of deletion as a "soft action"; meaning, the data is not removed from the database, only moved around.

This kind of logic should be in one place and once place only. Duplicating this particular query would be a duplication of business logic; it would violate the DRY principle; and, it would almost certainly lead to bugs and inconsistent data over time.

Once you realize that duplicating SQL is not, inherently, a violation of the DRY principle, it frees you up to embrace "performance as a feature". It grants you permission to write queries that are optimized for a particular use-case; queries that change for different reasons; and, queries that are easy to delete. As long as you are not duplicating "business logic", you should feel free to duplicate as much SQL as you feel is necessary.

Epilogue: What if my SQL queries contain lots of conditions?

If a SQL query contains a lot of conditions, it can be tempting to view those conditions as "business logic". For example:

SELECT
	id,
	name,
	createdAt,
	updatedAt
FROM
	project
WHERE
	type = 'simple'
AND
	isLocked = 0
AND
	isArchived = 0
AND
	isSample = 0
;

If this set of WHERE clauses were to show up in two different queries in the same application, it's easy to feel like some sort of duplication is taking place. But, it is not. This is not "business logic" - this is just "data-access". You happen to be querying for data that adheres to these conditions in this particular context. And, later on, you might need to query for data in a different context that is slightly different.

The trick it to ask yourself the question: Is it possible that one query will ever need to change while the other one doesn't. If so, then these two queries change for different reasons and are not violations of DRY.

Epilogue: What if I make a fundamental change and have to update a lot of SQL queries?

Dovetailing with the query above, applications evolve over time. That's natural. And, as an application evolves, so too will its queries. Imagine creating an application where the concept of "archiving" an entity didn't get introduced until much later in the application life-cycle. There's no doubt that you would have to add an isArchived check to many of the existing SQL queries.

Having to make a change in a lot of places may feel like "duplication"; or rather, a byproduct of duplicated logic. But, it is not. A fundamental change to an application is never easy. And, often involves touching a lot of code. That's what makes the change, "fundamental".

When making such a change to an application, each interface has to be reviewed and evaluated uniquely in the context of said change. For example, an isArchived check may need to be added to a "user's view" of the application; but, probably not to an "Support administrator's view". As such, fundamental changes can rarely be applied in as a sweeping update; instead, fundamental changes demand a holistic view of the application landscape.

Epilogue: What if I forget to add a necessary condition to a SQL query?

That's a bug. Bugs happen. If you can use the "Extract Method" refactoring pattern, it may help reduce the likelihood of bugs in the future. But, if you can't, just fix the bug and move on. Bugs are not inherently an indication of duplicated business logic.



Reader Comments

If the same query is used for multiple customers and one customer has a new requirement, you also have to take into consideration if the change will cause an unavoidable performance hit ( no matter how small) if it is worth it to make a version just for that customer. Now in the case of business logic, a flexible rules engine can be a lifesaver.

Reply to this Comment

@Christoper,

Exactly! At work, we actually had a somewhat similar thing happen recently. One team had to add a query for an edge-case; but, there was no index that made the query fast. And, while it only ran a few times, it had to do a full-collection scan each time. The burden on the Database was so great that it ended up crashing the service, even tough most of the service didn't even care about that query.

Reply to this Comment

In the example below:

<cffunction name="getByFilter" returntype="query">
 
    <cfargument name="id" type="numeric" required="false" />

    <cfquery name="local.results">
        SELECT
            id,
            name,
            email,
            passwordHash
        FROM
            account
        WHERE
            1 = 1
        <cfif structKeyExists( arguments, "id" )>
            AND
                id = <cfqueryparam value="#id#" cfsqltype="cf_sql_integer" />
        </cfif>
    </cfquery>

    <cfreturn results />

</cffunction>

Just keep in mind if you rely heavily on queries with multiple dynamic criteria, the database server may cache each rendered version of the passed sql separately, adding to the amount of cached queries on your server causing them to rotate much faster than anticipated.

Cached sql:

SELECT id, name, email, passwordHash, FROM account WHERE 1 = 1

SELECT id, name, email, passwordHash, FROM account WHERE 1 = 1 AND id = ?

One approach that adds a bit of verbose code but helps, is to write the sql in a way that the criteria passed determines what is filtered but the passed sql statement remains the same:

public Query function searchAccount(Struct criteria={})
{
   	return queryExecute("
		select
			id,
			firstName,
			lastName,
			email
		from
			Account
		where
			0 = 0
			and (:filterById != 1 or (id = :id))
			and (:filterByIsActive != 1 or (isActive = :isActive))
	", {
		"filterById":{"value":arguments.criteria.keyExists("id") ? 1 : 0, "cfsqltype":"cf_sql_tinyint"},
		"id":{"value":arguments.criteria.keyExists("id") ? arguments.criteria["id"] : javacast("null", ""), "cfsqltype":"cf_sql_integer", "null":!arguments.criteria.keyExists("id") ? true : false,
		"filterByIsActive":{"value":arguments.criteria.keyExists("isActive") ? 1 : 0, "cfsqltype":"cf_sql_tinyint"},
		"isActive":{"value":arguments.criteria.keyExists("isActive") ? (arguments.criteria.isActive ? 1 : 0) : javacast("null", ""), "cfsqltype":"cf_sql_tinyint", "null":!arguments.criteria.keyExists("isActive") ? true : false}
	});
			
}

Cached sql:

select id, firstName, lastName, email from Account where 0 = 0 and (? != 1 or (id = ?)) and (? != 1 or (isActive = ?))
Reply to this Comment

Very useful information. I would usually add an 'isArchived' check right from the start. So, these kind of issues should be resolved during the application blue printing phase. I think the trick is to imagine what sort of features, an application is likely to require, at the beginning of an application build. This is where experienced developers become invaluable, because they are able to draw on decades of knowledge, wisdom, failure & success.

However, I do acknowledge that it is virtually impossible to predict absolutely every feature that might or might not be required during the lifetime of an application, so understanding things like the DRY principle, is definitely an added benefit when constructing scalable systems.

Reply to this Comment

@Phill,

That's super interesting! I know that one of the benefits of using the parameterized queries in the caching; but, I don't think I ever considered the fact that dynamic WHERE clauses actually create multiple versions of the query that is passed over-the-wire and therefore different queries that are cached. Outstanding point!

And, the fact that you are setting the value to null when the argument doesn't exist -- so well thought-out. Because that is something that I've actually run into before (realizing that <cfqueryparam> "executes" even if it is inside a SQL conditional because, obviously it does that, in retrospect).

I still write the vast majority of my SQL in tag-based components using <cfquery> and <cfqueryparam>, so I am gonna noodle on how I might want to format this in that context. You've given me much to think about.

Reply to this Comment

@Charles,

So much this:

I think the trick is to imagine what sort of features, an application is likely to require, at the beginning of an application build. This is where experienced developers become invaluable, because they are able to draw on decades of knowledge, wisdom, failure & success.

Having more experience people involved in the discussion of a project is so helpful. Actually, it's helpful to have both experienced and novice people. Having that dichotomy can be great for sparking discussions. The advanced devs can share insights and the novice devs can ask for more information and explanation. This forces everyone to really think through an idea rather than taking anything at face-value.

And, hopefully, with more experience, you can get better at wearing both hats - pulling on your past experience, but then challenging yourself to really confirm that your ideas are worth adding.

Reply to this Comment

@Phill,

I am fascinated by your approach. I had to sit down and try it out:

https://www.bennadel.com/blog/3637-keeping-prepared-statements-consistent-even-with-dynamic-parameterized-queries-in-mysql-and-coldfusion.htm

One byproduct of this approach is that it would really open up the possibility for me to move to an all CFScript syntax. And that, in and of itself, might be enough to sell me on this concept.

I also feel like, in a way I can't really articulate, makes the dynamic query easier to reason about. Since the query is always the same, it feels like there's less mental overhead.

Reply to this Comment

@Ben,

I ran into this approach once we realized that our database was caching too many prepared statements needlessly, some search queries had multiple possible criteria for custom user dashboards, for example.

I always have been a big proponent of letting the database do what is best for, especially large record set retrievals and our search queries were meant to help expedite performance and not bog down the server.

Glad to be able to contribute to all the great discussions on your blog buddy, keep them coming!

Cheers!

Phill

Reply to this Comment

A few years ago, I needed to build parameterised queries dynamically.
It came about when I needed a routine to add over 250 global setting column values to a MySQL table. But, the routine would never know ahead of time, how many of the 250, would need updating.

People kept telling me that it couldn't be done, which is like a red rag to a bull, for me. So, I built a little library that allows one to pass in an unlimited amount of WHERE clause parameters.

It uses:

new Query()

Syntax.

But, the reason I wanted to join the discussion, is that I built these functions, using CF tags rather than CFScript. So, it is possible to do this kind of thing, in either. Ironically, I use CFScript much more, nowadays, but at the time, CFScript was like a foreign language to me:)

Anyway, I was going to post one of the functions that converts the WHERE clause params, but decided against it, because it is quite lengthy.

The only thing, I didn't build in, was the ability to handle sub queries, but it handles INNER JOINS etc.

I am just wondering whether this kind of approach would be an example of DRY programming?

Reply to this Comment

@Charles,

That sounds like an intense piece of SQL :D If you want to share it, I'd suggest putting it in a GitHub Gist and then sharing the Gist URL here -- that way, you can also edit it if you need to in the gist.

.... People kept telling me that it couldn't be done, which is like a red rag to a bull, for me.

Ha ha, I 100% understand this mentality :D In some ways, it's also why I change technologies very slowly - I am always convinced I can solve the problems using the technologies I already know.

As far as "DRY", I am not sure it relates to your query mechanics. It sounds like you had to create something complex to solve a complex problem. In and of itself, though, I wouldn't be able to say of that speaks to DRY.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.