I have been under the impression that putting "criteria" in the ON clauses of a SQL statement will create a faster query when compared to the same SELECT statement if the same criteria was placed in the WHERE clause. The ON clause is executed before the WHERE clause so I figured criteria there would perform better. When you join two tables together it creates a cross-product then it applies the WHERE clause to the resultant table join. At least this is what I think happens.
Anyway, that is a horrible explanation, but I went ahead and did some tests to see what kind of speed differences there are. To test, I queried a web statistics database, joining several tables together.
Here is the query with all the criteria in the WHERE clause:
Launch code in new window » Download code as text file »
Then, in this test, I have moved all the WHERE clause criteria to be part of the ON clauses:
Launch code in new window » Download code as text file »
In the second test, there is no WHERE clause at all. It is not needed. I thought this would show noticeable performance gains. To my surprise, it did not. Both queries operated at the same speed. In fact, the WHERE clause test was generally a few milliseconds faster! This shocked me a bit. Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution. Or, maybe my test query is not a good example. Perhaps there are situations where one is much faster than the other. Anyone have any feedback on that sort of thing?
And, this is no small set of data. The "hits" table as over 52,000 records in it.
So, I guess it comes down to which one is more readable / maintainable for you. The WHERE clause looks a bit cleaner I suppose.
Download Code Snippet ZIP File
Comments (4) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Caution: ColdFusion Zero Date vs. SQL Zero Date
ColdFusion ExpandPath() And GetCurrentTemplatePath()
You said: "Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution."
How about both? ;-)
SQL Server has a pretty decent optimizer. I would guess that it is probably compiling both queries down to the exact same execution plan. However, not all DBMSes have anything where near what I would call "decent". We have an AS/400 here running DB2 v5 and its optimizer is significantly less intelligent. In fact, not an hour ago I moved a boolean expression from the WHERE clause to an ON clause and sped up my query from 10 minutes to 3 minutes.
And yet, when we were on DB2 v4 last year, things were even worse. (No concept of a LIMIT/TOP function, if that gives you any idea.)
Working with the primary-school-level DB2 (at least, the versions I have experience with) has taught me soooo much about query optimization and the real ins-and-outs of SQL. I realize that even just 2 years ago I could write queries that worked, but probably not how I thought they worked, and certainly not as well as they could have worked.
Posted by Rick O on Sep 21, 2006 at 9:18 PM
Rick,
So, I am not sure that I understand you. Are you saying that the ON clause is faster than the WHERE clause (given your example), or are you saying that my assumption about it is false?
Posted by Ben Nadel on Sep 22, 2006 at 7:33 AM
Sorry, I did ramble a bit, eh?
The ON clause can be faster than the WHERE clause, but isn't necessarily always so. As you've seen, the MS SQL optimizer is pretty smart, but I was suggesting that you try a few more DBMSes and see how they handle moving statements around.
Different DBMSes are going to respond differently to different "optimizations". You would think, SQL having been a fairly stable standard for decades at this point, that the same rearrangement of SQL would produce the same results, but it won't. I was just using DB2 (sorry, UDB, whatever) as an example, as it always seems to me to have the least in common with MS SQL.
Posted by Rick O on Sep 22, 2006 at 10:54 AM
Rick,
I figured as much (at what you were driving at). Right now I work in a fairly MS SQL server-centric universe, but I can understand that every DB is gonna have it's own optmization routines. I guess that is why some are free and some are expensive ;)
Posted by Ben Nadel on Sep 22, 2006 at 11:03 AM