Ben Nadel's Easy Tips For Writing Better, More Optimized SQL

By Ben Nadel on March 27, 2008

I was just reading over on Andrew Powell's ColdFusion blog about the problem that server-side developers have with writing SQL. I don't quite agree with what Andy was saying; I think since SQL is such an integral part of web development, all server-side developers should have at least a decent understanding of SQL. They don't need to be experts, but there are some easy tips that can be followed to help make sure you don't make unneeded mistakes. To take Matt Williams' queue, I thought I would quickly share with you the easier, yet hugely crutial tips for writing better SQL (all based on mistakes I used to make):

Don't Ever Use SELECT *

This is, hands-down, the biggest, baddest SQL optimization violation out there. If you considered "SELECT *" to be off limits in your applications, I bet that would clear up a sizable chunk of the SQL problems that applications have. SELECT * causes problems in a few different ways:

It can lead to the transfer of a tremendous amount of excess data. As efficient as SQL servers are, there is still a lot of overhead to move the result set from the SQL server to the ColdFusion application space.
When using CFQueryParams (which you should), "SELECT *" causes caching issues that can throw hard to debug errors if you change your database structure (move or add columns).
"SELECT *" provides no insight into what is actually being returned to the ColdFusion application server. Is it one column? Is it 50 columns? Providing the column names will give all developers (even yourself) a much better grasp on the intent of the code being written and maintained. The only time I would ever consider "SELECT *" to be OK would be in a ColdFusion query of queries; however, even then, I like seeing the column names so that contextual information is always up front and in your face.

Don't EVER Use SELECT *

This one is so freaking important that I had to put it here twice. This should probably be the 11th commandment. Thou shalt not query thy database using SELECT *.

Only Return Data That Will Be Used

This goes hand-in-hand with the "SELECT *" point above. Only return data that you are going to use. Every time you return a column or row of data that won't be used in the subsequent page request, you are incurring the unneeded overhead of reading that data, transfering it to the ColdFusion application server, and converting it into a record set. Plus, be explicit in your code; let your code mean something - if you return a column, let that loudly telegraph the fact that it will be used in that request.

Use COUNT() Instead of Query.RecordCount

I used to be very guilty of this one: I would do something like [SELECT id FROM contact] to get the number of contacts in the database. This is no good. Use the COUNT() aggregate to find the number of records in a table with the given criteria. Again, this goes back to the idea that you don't want to return data that will not be used. There will be times when you want to get both the records and know how many were returned; this of course, is a totally valid use of QUERY.RecordCount.

User Query Param Binding

This one is ColdFusion specific, but you should use the CFQueryParam tag for parameter binding. This allows the structure of the query to be cached while using dynamic data. CFQueryParam causes some magic to happen such that a query that uses at least one CFQueryParam tag will almost always be faster that the equivalent query that doesn't use CFQueryParam. In fact, even if you have a query that does NOT have any dynamic filtering, throwing in a (1 = <cfqueryparam value="1" />) in the WHERE clause will actually create a performance gain.

Never Join Tables Using Commas

Granted, this is not always an option, but with more old school applications, you see two tables being joined using commas [FROM tablea, tableb]. This can/does create a very large cross product of the two tables before any filtering gets applied.

When Possible, Move WHERE Clauses Into JOIN/ON Clauses

This goes hand in hand with the point above - most people don't know about the magic of the compount JOIN/ON clause. Just as with the WHERE clause, you can have multiple conditions in the ON clause of a table JOIN. By moving all relevant filtering to this ON clause (from the WHERE clause), you create smaller, intermediary cross products. This is a bit more advanced than anything else there, but if you just sit down and learn the JOIN/ON clauses, you will benefit quite easily from this.

Don't EVER Use A Query Builder To Write Your SELECT Statement

Let's face it, if you need to return 50 columns, it's a pain in the butt to write them all out. There is the temptation to let your DBMS write that query for you. DO NOT DO THIS. Write that stuff out by hand; once you let the DBMS (database management system) write your query, you are mentally giving up control of that query and will become lazy about making sure you don't return extra data. Be diligent in your queries and they will naturally be more optimized.

There are a ton more ways to optimize your SQL queries, but I feel like these ones are the low hanging fruit; these optimizations can be applied by pretty much anyone who can do SQL with very little effort. Half of this is just being more explicit with your code. I hope that helps.

Short link: https://bennadel.com/1189

Reader Comments

SCOTT BENNETT Mar 27, 2008 at 1:26 PM

37 Comments

Hey Ben, these are all great tips!

But I am not really clear on what you are saying with the statement:

"In fact, even if you have a query that does NOT have any dynamic filtering, throwing in a (1 = <cfqueryparam value="1" />) in the WHERE clause will actually create a performance gain."

does that mean if i have a query like:

SELECT FirstName, LastName
FROM People

it would perform faster if I wrote it like:

SELECT FirstName, LastName
FROM People
WHERE 1 = <cfqueryparam value="1" />

Or is there more to what you are saying then that?

Jeff Mar 27, 2008 at 1:35 PM

20 Comments

Don't Use Select * should be the number one rule for all programmers in any language that interacts with databases. :)

<CFQUERYPARAM> is a little tricky as some older versions of databases don't support the performance gains, but they are still a good idea to help guard against SQL-injection Attacks. On thing to be careful of you can't cache a query that has queryparams for those of you who are just starting to use them.

I'll have to check out 1 = <cfqueryparam> it must be a weird JDBC feature that speeds up a query with just that statement in it.

Hansjoerg Mar 27, 2008 at 1:40 PM

1 Comments

I think it's important for a good Web Developer to write good SQL - as there is hardly any system out there without a database in the background.

For simple tasks, I can just recommend the usage of an ORM system like transfer (http://www.transfer-orm.com/), however. It's the very same to me as using jQuery (http://www.jQuery.com) instead of hardcoding all JS stuff.

Complex queries (as complex JS tasks) still need to be done by hand, otherwise monkeys could do the coding ;-)

Tom Mollerus Mar 27, 2008 at 1:46 PM

28 Comments

@Scott- I think what Ben means with the technique of using "1 = <cfqueryparam value="1" /> is not that you'd add a WHERE clause to a query that didn't have one before... I think he means that you should use query params even for values that don't ever change in your query. For instance, I often give db records an "active" flag to indicate whether or not the record should be pulled in customer-facing queries (e.g., "WHERE products.active = 1"). So what I think Ben means is that you should instead write "WHERE product.active = <cfqueryparam value="1" />".

@Ben- This whole cfqueryparam technique for otherwise static values is an interesting tip. Just how do you know that substituting a param for a number is faster? Do you have a reference?

Quan Tran Mar 27, 2008 at 1:50 PM

1 Comments

I think what ben meant was that the performance gain from 1=<cfqueryparam> is the same if it was compared to 1=1 which is equivalent in performance to having no where clause. Basically forcing the DB to cache the query structure.

Ben Nadel Mar 27, 2008 at 1:54 PM

15,688 Comments

@Tom,

I have not tested this personally, but I read these results on a blog that I trust a long time ago (I can never find the link though). Basically, what the blog entry said was that this query:

SELECT id
FROM contact

Will run slower than:

SELECT id
FROM contact
WHERE 1 = <cfqueryparam value="1" />

There is no *logical* difference between the two queries. However, since the latter one has a programmatic binding, the ColdFusion / SQL server will end running it faster. I believe it has to do with the caching mechanism, but am not sure of the internal reasoning.

As far as binding static values like Active, I used to do that; however, I have diverged away from that methdology as long as there is at least one dynamic clause being used, Ex:

WHERE
name LIKE <cfqueryparam value="%#q#%" />
AND
active = 1

Since the "name" is bound, I don't feel that the "active" needs to be bound as well.

Jeff Mar 27, 2008 at 1:55 PM

20 Comments

If you have no options in the where clause the database should cache the query plan anyway. Shouldn't it?

If we are just talking about t.status = <cfqueryparam cfsqltype="cf_sql_char" value="A">

then that makes more sense to me then just using it in a where 1 = 1 situation,

SCOTT BENNETT Mar 27, 2008 at 2:03 PM

37 Comments

@Tom,

That makes sense, and is how I use cfqueryparam.

If I have the query:

SELECT FirstName, LastName
FROM People
WHERE Active = 1

I would usually write it like:

SELECT FirstName, LastName
FROM People
WHERE Active = <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="1"/>

But that quote wasn't really clear to me, and I just wanted to hear from Ben if he had somehow discovered some special trick that I had never heard of before.

Tom Mollerus Mar 27, 2008 at 2:05 PM

28 Comments

Jeff, I agree with you-- I think that the database will cache just about any query that it comes across, regardless of whether or not it uses parameters, pushing the least-recently used item out of cache as it needs to. Parameter binding just makes more queries identical and therefore makes the planning phase go faster and keeps the query plan in cache longer.

I have a feeling that the speed gain found with using a WHERE clause is a different issue apart from parameter binding.

Nathan Mische Mar 27, 2008 at 2:27 PM

30 Comments

As for the "When Possible, Move WHERE Clauses Into JOIN/ON Clauses," while it is a good tip, I've found that most of the time the query optimizer is smart enough to figure this out on its own.

For example

SELECT
somefield
FROM
tablea
INNER JOIN
tableb ON
tablea.someid= tableb.someid
WHERE
tableb.active = 1

and

SELECT
somefield
FROM
tablea
INNER JOIN
tableb ON
tablea.someid = tableb.someid
AND tableb.active = 1

usually result in the same query plan, at least on SQL Server 2000. I'd be interested to hear what experiences people have had on other database platforms.

Phil Duba Mar 27, 2008 at 2:39 PM

7 Comments

@Nathan: I have found that if that column, tableb.active in this case, is not indexed then it is actually a little slower to put the tableb.active = 1 in the ON versus the WHERE. I noticed that in SQL Server 2005 and Oracle 9. Of course, that could have just been our schemas too, so I generally don't do non-index equalities in the ON statement.

Jeff Mar 27, 2008 at 2:39 PM

20 Comments

@Nathan

MySQL 4.1 seems to speed up the queries if you put the where in the join on large tables. At least in my experience.

MySQL 5 doesn't seem to care either way.

Nathan Mische Mar 27, 2008 at 2:44 PM

30 Comments

@Phil - good to know.

@Jeff - sounds like the query optimizer may have gotten a little smarter between MySQL 4.1 and 5.

Andrew Powell Mar 27, 2008 at 2:57 PM

5 Comments

Now don't go misquoting me... I said developers should not be writing SQL, not that they shouldn't know it. :)

Tony Petruzzi Mar 27, 2008 at 2:57 PM

44 Comments

The real reason for the where 1= 1 trick is if you are going to be creating a conditional where clause and don't want to write in the logic to check for the opening statement. so for instance, instead of doing this:

<cfif not arrayisemtpy(wherestatements)>
where
<cfloop from="1" to="#arraylen(wherestatements)#" index="i">
<cfif i neq 1>and </cfif>#wherestatements[i]#
</cfloop>
</cfif>

you can do this:

where
1=1
<cfloop from="1" to="#arraylen(wherestatements)#" index="i">
and #wherestatements[i]#
</cfloop>

much easier to read and code isn't it. The whole it increasing performance doesn't hold. I never really seen anyone do proper loading testing on the myth and until someone does and shows me the results, that what it is, a myth.

THE BIGGEST tip that you are forgetting and that nearly EVERYONE forgets is:

1) Make sure you place an index on foreign key columns your tables might have.

Do you know how many times I've looked at a slow performing database and seen that there was no index on the column being used as the foreign key? By just adding the index, you can increase you database performance dramatically.

William Bowen Mar 27, 2008 at 3:00 PM

5 Comments

I think when you say "Never Join Tables Using Commas" you're really advocating two things:

1. Don't use a FULL OUTER JOIN (this is what happens if you take the full cross-product with no WHERE or ON). I can't ever remember a time where I even needed one.
2. Use ANSI-standard SQL joins (e.g., FROM table1 a LEFT JOIN table2 b ON a.col1 = b.col2) instead of short-hand (e.g. FROM table1 a, table2 b WHERE a.col1 = a.col2 (+))

I thought your language was a little confusing initially. Also, if you are truly advocating the second point, keep in mind some DBMS do not support ANSI standard syntax. For instance, I ran across this the other day working on an older program running on Oracle 8i where Oracle uses its shorthand syntax and uses (+) for LEFT or RIGHT joins and does not support the explicit LEFT JOIN. I am generally a proponent of portable SQL as much as possible, but some times you have to use the

Here's a few more quickies I would add to my personal list:

Never ever use any DDL statements in your SQL. This includes CREATE, DROP, ALTER, etc. Generally these should be disabled for the DSN too, but I've seen programs that were creating temporary tables on the fly in Access databases and later dropping them (which was unsafe and broken on so many levels). This is a no-brainer for most people, but you never know.

Don't be afraid to use subqueries in a FROM or JOIN statement. I see a lot of queries written very inefficiently that could be effectively rewritten using a well-placed subquery or too. Of course, going crazy is not advised as they can hurt performance, but it's better than some silly solutions I've seen using 2 or 3 separate queries and then tying the results together via ColdFusion processing.

Just like above, don't be afraid to use SQL's aggregate functions. So many times I see people return a dataset and then use ColdFusion to loop through and get a SUM or COUNT or perform additional processing they either could've done much faster in the SQL. Some times there are extenuating circumstances or people don't understand how to write the GROUP BY properly, but many times it could be done easily via SQL.

Following on the previous point, when you have a calculated column, give it a name (e.g., SELECT nvl(col1,0) AS col1 FROM table1). I don't want to have to reference it by position later.

I'm a big fan of capitalizing SQL keywords. I personally hate seeing queries written like "select col1 from table," but I suppose that's really a preference.

Probably lots more but those are some of my bigger pet peeves.

Tom Mollerus Mar 27, 2008 at 3:19 PM

28 Comments

I'd go one step farther than Tony's good suggestion on creating indexes on foreign keys. I'd suggest creating indexes on ANY column that you use in a WHERE statement, no matter whether it's a foreign key. I've found it really speeds up queries on large tables:
http://www.mollerus.net/tom/blog/2007/09/creating_database_indexes_on_single_columns.html

Lincoln Milner Mar 27, 2008 at 3:30 PM

1 Comments

@Tom,

Be careful when you say "creating indexes on ANY column that you use in a WHERE statement, no matter whether it's a foreign key". I've seen times where having too many indexes on a table can actually hurt overall DB/query performance (at least for Oracle).

My 2¢ would be (at least this is what I do with my Oracle stuff):
1. Use an explain plan
and
2. If the database query optimizer engine isn't doing the best thing (thinking it knows best when sometimes it doesn't, see explain plan above), use hints to force usage of indexes and such

I wish I had a third point, but I don't have any more pennies...

Steve Bryant Mar 27, 2008 at 3:38 PM

59 Comments

If you find yourself looping over a query (i.e., you have a query in a <cfloop> or <cfoutput query="...">), rewrite the query to avoid that. It can be done.

Oh yeah, don't use SELECT *!

Ben Nadel Mar 27, 2008 at 3:42 PM

15,688 Comments

Thanks for all the great discussion :) I feel like its only going to make this post all the more of a resource.

Jeff Mar 27, 2008 at 3:46 PM

20 Comments

@Tony

I couldn't agree more. That's how I use the 1 = 1 or if I want to fail the query 1 = 2. And Actually that should be on the list.

Start conditional Where's with 1 = 1 :)

eric belair Mar 27, 2008 at 4:18 PM

1 Comments

I have to agree that creating indexes on every column used in your WHERE clauses is not a good thing. I tried this in a reporting application at my last job, and the indexes ending up using more disk space than all of the data in our database. Then one day, I tried removing those indexes - the queries performed better in some cases. I wish there were more clear-cut rules/standards on when to create indexes (other than the obvious FK and PK columns).

Tom Mollerus Mar 27, 2008 at 4:36 PM

28 Comments

Eric, it's a good point that you don't want to create indexes *everywhere*, but I don't know of any rule of thumb to help guide you... other than do it if it makes your queries perform faster. On certain tables of mine (where we store millions of records of click-tracking data, for instance), I'm sure that the indexes take up a lot of hard drive space, but they also make our lookups about x100 faster.

Tony Petruzzi Mar 27, 2008 at 4:42 PM

44 Comments

continuing with the indexes....

if you are using sql server and don't know what indexes to put on a table, just use the index tuning wizard that comes with it. the index tuning wizard can also help with a table that has too many indexes by finding the ones the table needs and dropping the others. alot of people never even know that this valuable tool comes with sql server.

David Buhler Mar 27, 2008 at 5:02 PM

45 Comments

To further Tony's post, the index is going to be one of the absolute biggest performance gains you'll ever experience. If you need to include multiple indexes, you'll need to ensure you have not forgotten to create an index for each column that is part of a where clause.

And to that end, I encourage everyone using MySQL to use the "Explain" declaration in your SQL to find missing indexes in a SQL statement.

David Buhler Mar 27, 2008 at 5:05 PM

45 Comments

@ Andrew

In the past, I've gone back to each SQL statement at the end of a project, copied the SQL, and added the 'Explain' declaration in EMS Manager 2007 for MySQL to find each and every SQL statement that is missing an index.

Jeffrey Bouley Mar 27, 2008 at 5:09 PM

3 Comments

Ben, I'd also like to add, always use stored procedures if possible. If you apply "separation of concerns" here, it is never a good idea to put sql in application code.

Security and performance are an after thought regarding ColdFusion as well as it's integrated and forces data typing which safeguards against sql injection. With regard to performance, if your db design is in order with indexes in the right places, your queries will run faster with sql being in the database.

I've heard the queryparam argument with bind variable and I agree performance gain may be negligible but you're still passing more text over the wire with inline and if it's not an encrypted connection you can get sniffed. Who likes looking at a cfc method that has a 200 line query anyway? Good luck.

Ben Nadel Mar 27, 2008 at 6:05 PM

15,688 Comments

@Jeffrey,

I, personally, have never dealt with a stored procedure. But, even so, I view these as a fairly high-level feature of a database. That's not so much about writing better SQL as it is about having better "application architecture"? After all, you can still write really bad SQL in a stored procedure - there is nothing stop someone from doing that :)

Zac Spitzer Mar 27, 2008 at 8:08 PM

55 Comments

for people using oracle, there is a good discussion on the pro's and con's about binding or not binding with a static value

http://www.oracle.com/technology/oramag/oracle/08-jan/o18asktom.html

as for the stored procedures, it's a lot easier to tune a CF application which doesn't use stored procedure and it exposes the sql in the debug for easy profiling and tweaking.

It also makes working with multiple versions of an application much easier as you don't have to worry about changing the stored procedures when you point a different code base at the same database

Stored procedures are great for doing a lot of intensive processing within the database. Wrapping a simple select up in a stored procedure is often overkill

Jeffrey Bouley Mar 27, 2008 at 9:52 PM

3 Comments

@Ben, totally agree on the bad sql in stored procs statement, my point was better to be bad and in the database than bad and in ColdFusion.

@Zac, I couldn't disagree with you more. What is the benefit of seeing the actual sql in the debug output when you know where it is whether in CF or in a proc? Debug shows execution times for both which can help you tune, etc.

Not sure of your point regarding pointing a new CF application at an existing db with procs either; they should and would work just like in-line would.

A list of points on stored proc benefits:

1. One of the major benefits of procs is that they allow you to maintain a single set of code that can be utilized by all current and future applications that integrate with the database.

2. If changes to the data model occur, the stored procedure is updated to reflect the changes, but the application (CF) calling that stored procedure will work without changes.

3. Typing is enforced via procedure params thwarting sql injection and it makes code reviews much easier with regard to checking for cfqueryparam on your dev teams sql calls.

The only time I would possibly recommend against procs would be if your app. server solution was engineered to be db portable, usually not the norm with CF apps and most web shops pick their db poison for an application out of the gate.

So leverage procs in them. If stored procs were only good for intensive processing in the database then all the enterprise solutions I've ever worked on in my career with and as a dba are overengineered Zac 8-) and the Allaire brothers were wrong for adding a tag to support it.

I recommend you check out Using SQL by rafe colburn discussing the benefits of procs, it's a great book on oracle and sql server approaches to stored procedures. A bit dated, but a helpful reference.

I have much experience in this area, point being stored procs are your friend, don't be afraid to jump in the deep end every once in awhile.

Christoph Schmitz Mar 28, 2008 at 6:19 AM

26 Comments

Never use SELECT * should actually read 'Never use SELECT * unless in an EXISTS Clause'. It is OK to write
SELECT ID, name
FROM tblA a
WHERE EXISTS
(
SELECT *
FROM tblB
WHERE fkField=a.field
)

According to Joe Celko, this makes the query optimizer choose the best index and will result in better performance.

As for indexes: although I agree that indexes usually speed up performance, I have had cases where indexes on FK columns would actually slow down query execution considerably. So always take a look at the query execution plan and test your indexes.

Chris

Francois Levesque Mar 28, 2008 at 6:54 AM

25 Comments

Subscribed! Great discussion you've started here, Ben :).

Tony Petruzzi Mar 28, 2008 at 7:49 AM

44 Comments

last thing about indexes any then I'll stop....

make sure that when you are examining the execution plan of a query, you look to see if the indexes used are doing a scan or a seek.

the difference is that same as it would be for a table. a scan looks through the entire index for the matches, where as a seek knows where the matches are.

the goal is to optimize your statement and indexes so that all of the indexes used within the statement are performing seeks.

remember that a poorly designed index is just as bad and sometimes even worst then no index.

Ben Nadel Mar 28, 2008 at 7:51 AM

15,688 Comments

@Christoph,

I got a great tip about the EXISTS clause one time. I think it was from Joe Rinehart (Joe, sorry if I am misquoting you). I was told to use a "SELECT 1" when using the EXISTS:

WHERE
. . . EXISTS
. . . (
. . . . . . SELECT 1
. . . . . . FROM [table]
. . . )

The reason for this is that you never actually use the data that comes back from the sub-select. So, but using a "1" (one) instead of any real data point, it kind of drives home the point that that sub-query is not doing anything but checking existence.

In the end, I think this does the same exact thing; but, using the "1" rather than the "*" gives, in my opinion, a better cue to the programmer that return of that query should not be taken into account.

Ben Nadel Mar 28, 2008 at 7:57 AM

15,688 Comments

@Tony,

Got a question for you about indexes. When I create them, I am trying to visualize how they get used as the cross product of the tables is built. So, let's say that I have a table like this for discussion sake:

[contact]
id
name
address_id

[address]
id
stree1
(etc.)

Now, let's say that more often than not, I want to return these in a single query like:

SELECT
. . . c.id,
. . . c.name,
. . . a.street1
FROM
. . . contact c
INNER JOIN
. . . address a
ON
. . . c.address_id = a.id
WHERE
. . . c.id = #id#

In this case, obviously the [c.id] and [a.id] columns would be uniquely indexed as they would be the primary keys or their respective tables. But, then I think about [c.address_id]. It is used in the join, however, I never use that column without already having a the given row. Meaning, I never scan the table looking for values in that column, but rather, I get a given row, and join to another table using a column value in the given row.

So, in a situation above, would you suggest indexing [c.address_id]? Or would that serve no purpose?

Jeff Mar 28, 2008 at 9:33 AM

20 Comments

@Jeffrey

Take this with a grain of salt since I work with mostly mySQL 4.1 and before. I have had extensive experience with Oracle and MSSQL servers and programs :).

I don't really see what stored procedures gain you. Much like the "you need indexes everywhere" it's not always the case. The projects I used to do with all Stored Procedure (SP) back ends (I used too do them :) ) Were always a bear to maintain. They add a level of complexity to your application that isn't always needed.

I usually reserve SP's for large things that need to be improved (reports). These days with all my contract work it's much better for me to keep the SQL in my CF code and then "upgrade" it to a stored procedure if it's need.

But I understand why people would want to use lots of SP's.

David Buhler Mar 28, 2008 at 9:34 AM

45 Comments

@Ben,

My understanding is that each table that has a "Joined On" ID needs to have an index. Each table with a where clause needs to have an index on the "where" value.

If you have a table with two different "Where " clauses, such as :

Select
kittens.name
From
kittens
Where kittens.cute=1 and kittens.furry=1

You would need an index for "cute" and "furry" together.

If you have:

Select
kittens.name
From
kittens
Where kittens.cute=1

You would need an index for JUST "cute", too.

With MySQL, the query optimizer will tell you if an index exists for your SQL statement (for MS SQL, it's SET SHOWPLAN_ALL). If I ran the following query in MySQL:

EXPLAIN
Select
kittens.name
From
kittens
Where kittens.cute=1 and kittens.furry=1

And I did not have an index that included BOTH "cute" and "furry" as a single index, the query optimizer, using EXPLAIN, would tell me what index the DB thought about using, and which one it did use, if any.

My basic understanding is this: Each time you write a SQL query, you have to run it through the query optimizer with EXPLAIN to ensure there is a valid index.

I have tables with up to 10 indexes (I think it's indices, but whatever). A great example is a table where a user may or may include additional sorting information. My index names are usually something like the following:

isFurry_isCute_indx
isFurry_indx
isCute_indx

For MySQL users: http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

For MS SQL users:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/tsqlref/ts_set-set_365o.asp

William Bowen Mar 28, 2008 at 9:36 AM

5 Comments

@Jeffrey Bouley - One good reason I've found to support application SQL versus stored procedures is the fact that I get to leverage my constant variables in the SQL I wrote within the application, versus having to define them again in my DBMS or hard code those values.

That can be very handy for transparency or the fact that if one of those values changes I only have to update it in truly one place versus DB and the application. The whole point of defining global constants is so that if those variables change I don't have to change them multiple places.

I also find that dealing with stored procedures can be a team problem in one of two ways - typically I am the only one who has stored proc experience so I end up being the only one who can maintain them. However, if the SQL goes in the application, most of my team can still see the SQL and update it effectively without worrying about screwing up the DB.

The other side of that is some times I've worked with clients where the DBA wants developers to have no access to the database whatsoever and insists on writing all the procedures behind the scenes. Some times this works out fine, but other times they end up not being so good and we get in a war of words in which he accuses the application of not being coded properly, etc. Most (but not all) DBAs are usually not web applications guys that fully understand that side (just like most web application guys are passable but not fully qualified DBAs). I need to be able to see the SQL and make sure it's doing what it needs to.

I've also found it becomes a problem during deployment when upgrading the application. I'd rather be able to just push all the code up and not worry about also synchronizing the database stored procedures at the same time. Some times this is an issue regardless because the structure changes. But usually I can change code without having to modify the database at the same time, which is nice. I can just push all the new code up, clear the CF cache and it's done with no down-time. Now if you're talking about making sure the new code and the DB code it depends on go up at the same and are synchronized in a live environment, it becomes much more a pain in the ass.

Finally, particularly with the increasing decline of the stateless web in favor of more interactivity like AJAX, I find it is often necessary to intermix database interactions with interface communication to the end user or other application functions. When I'm abstracting everything into stored procedures, I lose most of that modularity that allows me to do things between queries when necessary. Unless of course I break those out into more stored procedures - which kind of defeats the point for me in the first place of being able to invoke a single DB procedure and forget about it.

Security has generally not been an issue for me, since as long as you write queries intelligently and use the appropriate safeguards (including <cfqueryparam) I've generally found my code to be safe from SQL injection (which has mostly fallen by the wayside these days as an attack vector). Vulnerabilities such as XSS or hacking hidden input fields which don't get re-validated are much more common these days, so as long as you treat ALL data coming from the client as hostile until proven otherwise, you're usually fine. If your connection to the database is being packet sniffed I'd say you've got bigger problems on your hands than whether your SQL is in the app or DB ;)

Jeffrey Bouley Mar 28, 2008 at 10:49 AM

3 Comments

@William, so many points I don't know where to start 8-)... I'll try quoting what you said with my replies.

"I get to leverage my constant variables in the SQL I wrote within the application"
Who says you can't pass these app. constants as arguments to your proc?

"The whole point of defining global constants is so that if those variables change I don't have to change them multiple places"
See last comment.

"typically I am the only one who has stored proc experience"
A good time for your team to broaden their knowledge, procs aren't rocket science.

"worrying about screwing up the DB"
No different than screwing up the db via CF.

"DBAs are usually not web applications guys"
This is an interoffice political struggle that I've experienced in the past, if your resume speaks to this skill set have a manager get involved, things can always be worked out here in most cases.

"synchronizing the database stored procedures at the same time"
Create a build script and QA it prior to pushing to production, very straightforward process.

"clear the CF cache and it's done with no down-time"
You can cache procs in CF 8 now and could cache then in mem scope vars in the past (really not clear on the points you are making in this scenario)

"When I'm abstracting everything into stored procedures, I lose most of that modularity that allows me to do things between queries when necessary"
How so? A function call is a function call, it's actually more modular and with the added benefit of multiple resultset/cursor returns it makes your cf code that much more manageable.

"Unless of course I break those out into more stored procedures"
I think you fall short here with regard to a procs abilities, no need to make multiple calls if written correctly.

"Security has generally not been an issue"
Agree with "bigger problems", as your network admins have to bear the burden of network sec. but if you've ever worked for a gov't. agency this is a very important initiative.

Ben Nadel Mar 28, 2008 at 10:54 AM

15,688 Comments

@David,

Interesting. My indexing skills are not that good. In SQL Server, I have only ever set up single-column indexes. I have been doing some work in MySQL lately and I see that Navicat allows me to set up an index that has multiple keys (which I have been leveraging).

Thanks for the tips.

David Buhler Mar 28, 2008 at 11:08 AM

45 Comments

@ Ben,

I use EMS SQL Management Studio which I absolutely love. I'm not familiar with other tools, so maybe I am oblivious, but EMS lets me view all of the indexes for a table, their name, and then run the query to see visually in a GUI what indexes MySQL thought about using, and which one it eventually did use (or didn't). It's also very stable.

http://www.sqlmanager.net/en/products/mysql/query

EMS also has a MS SQL server GUI, but I have never used it.

Ben Nadel Mar 28, 2008 at 11:37 AM

15,688 Comments

@David,

Thanks man, I will look into that.

William Bowen Mar 28, 2008 at 1:53 PM

5 Comments

@Jeffrey,

Fair points, reasonable rebuttals, I probably won't address them all (a lot of them get more into matters of opinion/personal experience).

A note about the constants. I don't want to have to pass them different places (including the database). That's why they get created in an easily available scope (often the application scope where they persist) or sometimes the Request scope or where ever. I create it one time and it's always available to everything. That's my philosophy on those variables. I don't want to pass bits and pieces of them piecemeal to a stored proc.

For the expertise issue, of course I'd like my team to learn and grow as much as possible. The reality is the client frequently comes to me with an unreasonable deadline that will force my hand into making sure everyone does what they do best to get the job done as efficiently as possible. Learning opportunities can be rare in such cases. Pushing back is always an option, but there's only so much you can push back without incurring wrath.

Naturally they always agree they'd like the team to get better, but when the s**t hits the fan, they always want their "best people" on it doing what they do best. Most of the learning opportunities I've had have come as a result of personal exploration.

As for the security, the majority of my work has been for the government (EPA, FAA, DoS, GSA) and we've never had issues. While they are very concerned about security, a lot of time there are numerous holes I'm able to point out of which they weren't even aware. People packet sniffing the connection between my DB and app server, which is usually on a complete internal network, is usually the last of my concern and not a compelling enough reason for me to put all SQL inside my DB code. Even if they can see my SQL, what good does it do them? They might see some data, but they'd be seeing that data even if I was passing it to a stored proc. They might glean a little about my DB structure, but if they're packet sniffing my network they probably already saw the ERD on our shared drive anyway!

To me, the most compelling argument for putting SQL in the DB is performance, which in most cases is not enough to really warrant it. A couple of milliseconds won' t make a difference to most apps unless they are extremely high volume (hundreds of thousands of simultaneous users). I can always make a procedure here or there when it makes sense without putting all of my SQL into the DB exclusively. If I'm working on an app where it will make a big difference, then it's certainly something to consider.

I respect your point of view and the way you stated it, we're simply on the two sides of the issue.

Thanks,

William

Allen Mar 28, 2008 at 3:56 PM

8 Comments

" One good reason I've found to support application SQL versus stored procedures is the fact that I get to leverage my constant variables in the SQL I wrote within the application, versus having to define them again in my DBMS or hard code those values.

I don't understand why this is an issue. Seems like these global variables are actually part of the business logic which shouldn't be sitting in the database anyway. Where you do need those values, you just pass them into the stored proc.

Roy Apr 2, 2008 at 9:44 AM

1 Comments

I haven't done any benchmarking, but one day while in a particularly anal mood, I replaced
(1=1)
not with
(1 = <cfqueryparam value="1" />)
but
(<cfqueryparam value="1" /> = <cfqueryparam value="1" />)

This was yesterday, so I found this article amusing.

rinu Apr 14, 2008 at 8:20 AM

1 Comments

This article is very good.It helped me a lot...Thanx

opakay Nov 17, 2009 at 1:05 PM

1 Comments

Dear Ben, I am grateful for finding this blog.
Please somebody talk about flushing the cf cache. In my office, each time our coldfusion application is not spooling fast. All we do is restart the server. I believe what we are doing is flushing the cf cache. If i am wrong pls correct me but if I am right, is there not a better way of doing this? Thank you.

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.