A couple of months ago, I tried to list out several lessons that I wish I had learned much earlier in my web development career. One of the more important lessons in that list is the proper application of the DRY principle; otherwise known as the, "Don't Repeat Yourself", principle. After writing that post, it occurred to me that it would be worth exploring SQL queries in the context of DRY. At first, it can feel like two SQL queries represent duplicate code if they contain the same query. But, this does not violate DRY; not unless the queries contain "business logic."
Imagine a SQL query in your application that looks like this:
SELECT id, name, email FROM account WHERE id = 1
Now, imagine a second query in the same application that looks like this:
SELECT id, name, email, passwordHash -- This field is the only difference. FROM account WHERE id = 1
To the newer developer, these two SQL queries are essentially the same. Which is why this scenario feels so much like a violation of the DRY principle. This is especially true when you are first learning about Design Patterns and are very eager to refactor everything you see into the clean abstractions that you just read about.
Trust me, I know how that feels. I've been there. I've used Design Patterns that I didn't fully understand. I've made my life much harder than it has to be.
The trap here is that the developer inappropriately applies the DRY principle to "syntax". Meaning, the developer sees the same (or similar) series of tokens, references, and operators and they believe that this represents "duplication". But, the DRY principle applies to the duplication of business logic, not the duplication of "code".
These queries have no inherent business logic. They are just data-access.
If we zoom-out from the queries a little bit, the picture starts to come into focus. Let's step up one level in the data-access context:
<!--- Our first query. ---> <cfquery name="local.results" cachedwithin="#one_hour#" datasource="#read_replica#"> SELECT id, name, email FROM account WHERE id = 1 </cfquery> <!--- Our second query. ---> <cfquery name="local.results"> SELECT id, name, email, passwordHash -- This field is the only difference. FROM account WHERE id = 1 </cfquery>
Now that we see a bit more of the context, albeit a heavily truncated view, we can see that these two queries are executing under very different constraints. The first query is running against a read replica and is also using some low-level ColdFusion caching. The second query, on the other hand, is running against the master database.
What we begin to understand is that these two queries change for different reasons, which is a great indication that "duplication" is not taking place. The first query is clearly being used in a context in which replica lag and stale data is an "acceptable" trade-off. The second query, on the other hand, is being run in a context where only immediately-consistent data is acceptable for consumption.
Now, to be clear, I am not saying that you should be continually duplicating query logic all over your application. If you have query logic that can be wrapped-up and reused, by all means do it:
<cffunction name="getByFilter" returntype="query"> <cfargument name="id" type="numeric" required="false" /> <cfquery name="local.results"> SELECT id, name, email, passwordHash FROM account WHERE 1 = 1 <cfif structKeyExists( arguments, "id" )> AND id = <cfqueryparam value="#id#" cfsqltype="cf_sql_integer" /> </cfif> </cfquery> <cfreturn results /> </cffunction> <!--- .... then, elsewhere in your application .... ---> <cfscript> var account1 = someAbstraction.getByFilter( id = 1 ); var account2 = someAbstraction.getByFilter( id = 2 ); var account3 = someAbstraction.getByFilter( id = 3 ); </cfscript>
Here, we've created an abstraction that allows us to simplify some of our calling code. But, this is not an exercise in "DRYing the code out"; nor does this fundamentally change the situation - it has only changed the "syntax" of the data-access. No "logic" has been encapsulated - we've only created a "short-hand" for a query that lends itself to re-use.
You can think of this change more along the lines of the "Extract Method" refactoring pattern, not an exercise in removing duplication. It is an implementation detail only; granted, one that happens to reduce the total lines-of-code (LOC) and will likely lead to greater developer velocity.
The DRY principle is about "business logic". If a SQL query contains business logic, it should not be duplicated. Take, for example, the following deletion query:
INSERT IGNORE INTO account_archive SELECT id, name, email, ( '[sanitized]' ) AS passwordHash FROM account WHERE id = 1 ; DELETE FROM account WHERE id = 1 ;
This query is not just data-access. This query contains business logic that defines how two concepts - "account" and "account archive" - have to change in lock-step. It also defines the very act of deletion as a "soft action"; meaning, the data is not removed from the database, only moved around.
This kind of logic should be in one place and once place only. Duplicating this particular query would be a duplication of business logic; it would violate the DRY principle; and, it would almost certainly lead to bugs and inconsistent data over time.
Once you realize that duplicating SQL is not, inherently, a violation of the DRY principle, it frees you up to embrace "performance as a feature". It grants you permission to write queries that are optimized for a particular use-case; queries that change for different reasons; and, queries that are easy to delete. As long as you are not duplicating "business logic", you should feel free to duplicate as much SQL as you feel is necessary.
Epilogue: What if my SQL queries contain lots of conditions?
If a SQL query contains a lot of conditions, it can be tempting to view those conditions as "business logic". For example:
SELECT id, name, createdAt, updatedAt FROM project WHERE type = 'simple' AND isLocked = 0 AND isArchived = 0 AND isSample = 0 ;
If this set of
WHERE clauses were to show up in two different queries in the same application, it's easy to feel like some sort of duplication is taking place. But, it is not. This is not "business logic" - this is just "data-access". You happen to be querying for data that adheres to these conditions in this particular context. And, later on, you might need to query for data in a different context that is slightly different.
The trick it to ask yourself the question: Is it possible that one query will ever need to change while the other one doesn't. If so, then these two queries change for different reasons and are not violations of DRY.
Epilogue: What if I make a fundamental change and have to update a lot of SQL queries?
Dovetailing with the query above, applications evolve over time. That's natural. And, as an application evolves, so too will its queries. Imagine creating an application where the concept of "archiving" an entity didn't get introduced until much later in the application life-cycle. There's no doubt that you would have to add an
isArchived check to many of the existing SQL queries.
Having to make a change in a lot of places may feel like "duplication"; or rather, a byproduct of duplicated logic. But, it is not. A fundamental change to an application is never easy. And, often involves touching a lot of code. That's what makes the change, "fundamental".
When making such a change to an application, each interface has to be reviewed and evaluated uniquely in the context of said change. For example, an
isArchived check may need to be added to a "user's view" of the application; but, probably not to an "Support administrator's view". As such, fundamental changes can rarely be applied in as a sweeping update; instead, fundamental changes demand a holistic view of the application landscape.
Epilogue: What if I forget to add a necessary condition to a SQL query?
That's a bug. Bugs happen. If you can use the "Extract Method" refactoring pattern, it may help reduce the likelihood of bugs in the future. But, if you can't, just fix the bug and move on. Bugs are not inherently an indication of duplicated business logic.
Want to use code from this post? Check out the license.