SQL Optimization And ON Clause vs WHERE Clause

Posted September 21, 2006 at 6:18 PM

Tags: ColdFusion, SQL

I have been under the impression that putting "criteria" in the ON clauses of a SQL statement will create a faster query when compared to the same SELECT statement if the same criteria was placed in the WHERE clause. The ON clause is executed before the WHERE clause so I figured criteria there would perform better. When you join two tables together it creates a cross-product then it applies the WHERE clause to the resultant table join. At least this is what I think happens.

Anyway, that is a horrible explanation, but I went ahead and did some tests to see what kind of speed differences there are. To test, I queried a web statistics database, joining several tables together.

Here is the query with all the criteria in the WHERE clause:

 Launch code in new window » Download code as text file »

  • <cfquery name="qWhereTest" datasource="...">
  • SELECT
  • COUNT( * )
  • FROM
  • web_stats_hit h
  • INNER JOIN
  • web_stats_url u
  • ON
  • h.web_stats_url_id = u.id
  • INNER JOIN
  • web_stats_session s
  • ON
  • h.web_stats_session_id = s.id
  • INNER JOIN
  • web_stats_user_agent a
  • ON
  • s.web_stats_user_agent_id = a.id
  • WHERE
  • <!--- Get urls that were in blogs or snippets. --->
  • (
  • u.name LIKE
  • <cfqueryparam
  • value="%blog%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • OR
  • u.name LIKE
  • <cfqueryparam
  • value="%snippets%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  •  
  • <!---
  • Get urls what were in the DB. This has no purpose
  • other than to make the DB work harder.
  • --->
  • AND
  • u.id BETWEEN
  • (
  • SELECT
  • MIN( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • AND
  • (
  • SELECT
  • MAX( u2.id )
  • FROM
  • web_stats_url u2
  • )
  •  
  • <!--- Get sessions started from september. --->
  • AND
  • s.date_created BETWEEN '2006-09-01' AND '2006-10-01'
  •  
  • <!--- Get sessions user Mozilla browsers. --->
  • AND
  • a.name LIKE
  • <cfqueryparam
  • value="%Mozilla%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • </cfquery>

Then, in this test, I have moved all the WHERE clause criteria to be part of the ON clauses:

 Launch code in new window » Download code as text file »

  • <cfquery name="qOnTest" datasource="...">
  • SELECT
  • COUNT( * )
  • FROM
  • web_stats_hit h
  • INNER JOIN
  • web_stats_url u
  • ON
  • (
  • h.web_stats_url_id = u.id
  •  
  • <!--- Get urls that were in blogs or snippets. --->
  • AND
  • (
  • u.name LIKE
  • <cfqueryparam
  • value="%blog%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • OR
  • u.name LIKE
  • <cfqueryparam
  • value="%snippets%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  •  
  • <!---
  • Get urls what were in the DB. This has no purpose
  • other than to make the DB work harder.
  • --->
  • AND
  • u.id BETWEEN
  • (
  • SELECT
  • MIN( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • AND
  • (
  • SELECT
  • MAX( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • )
  • INNER JOIN
  • web_stats_session s
  • ON
  • (
  • h.web_stats_session_id = s.id
  •  
  • <!--- Get sessions started from september. --->
  • AND
  • s.date_created BETWEEN '2006-09-01' AND '2006-10-01'
  • )
  • INNER JOIN
  • web_stats_user_agent a
  • ON
  • (
  • s.web_stats_user_agent_id = a.id
  •  
  • <!--- Get sessions user Mozilla browsers. --->
  • AND
  • a.name LIKE
  • <cfqueryparam
  • value="%Mozilla%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  • </cfquery>

In the second test, there is no WHERE clause at all. It is not needed. I thought this would show noticeable performance gains. To my surprise, it did not. Both queries operated at the same speed. In fact, the WHERE clause test was generally a few milliseconds faster! This shocked me a bit. Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution. Or, maybe my test query is not a good example. Perhaps there are situations where one is much faster than the other. Anyone have any feedback on that sort of thing?

And, this is no small set of data. The "hits" table as over 52,000 records in it.

So, I guess it comes down to which one is more readable / maintainable for you. The WHERE clause looks a bit cleaner I suppose.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page



Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Sep 21, 2006 at 9:18 PM // reply »
153 Comments

You said: "Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution."

How about both? ;-)

SQL Server has a pretty decent optimizer. I would guess that it is probably compiling both queries down to the exact same execution plan. However, not all DBMSes have anything where near what I would call "decent". We have an AS/400 here running DB2 v5 and its optimizer is significantly less intelligent. In fact, not an hour ago I moved a boolean expression from the WHERE clause to an ON clause and sped up my query from 10 minutes to 3 minutes.

And yet, when we were on DB2 v4 last year, things were even worse. (No concept of a LIMIT/TOP function, if that gives you any idea.)

Working with the primary-school-level DB2 (at least, the versions I have experience with) has taught me soooo much about query optimization and the real ins-and-outs of SQL. I realize that even just 2 years ago I could write queries that worked, but probably not how I thought they worked, and certainly not as well as they could have worked.


Sep 22, 2006 at 7:33 AM // reply »
6,516 Comments

Rick,

So, I am not sure that I understand you. Are you saying that the ON clause is faster than the WHERE clause (given your example), or are you saying that my assumption about it is false?


Sep 22, 2006 at 10:54 AM // reply »
153 Comments

Sorry, I did ramble a bit, eh?

The ON clause can be faster than the WHERE clause, but isn't necessarily always so. As you've seen, the MS SQL optimizer is pretty smart, but I was suggesting that you try a few more DBMSes and see how they handle moving statements around.

Different DBMSes are going to respond differently to different "optimizations". You would think, SQL having been a fairly stable standard for decades at this point, that the same rearrangement of SQL would produce the same results, but it won't. I was just using DB2 (sorry, UDB, whatever) as an example, as it always seems to me to have the least in common with MS SQL.


Sep 22, 2006 at 11:03 AM // reply »
6,516 Comments

Rick,

I figured as much (at what you were driving at). Right now I work in a fairly MS SQL server-centric universe, but I can understand that every DB is gonna have it's own optmization routines. I guess that is why some are free and some are expensive ;)


dd
Feb 26, 2009 at 1:25 PM // reply »
1 Comments

Can you let me know the difference between the following 2 sql's

Sel acct_id ,
….
...
From <table1> a join <table2> b
on a.sol_id = b.sol_id
and a.tc_id = b.tc_id
and a.msn_num = b.msn_num
______________________________________s

Sel acct_id ,
….
...
From <table1> a join <table2> b
on a.sol_id = b.sol_id
where a.tc_id = b.tc_id
and a.msn_num = b.msn_num


Feb 26, 2009 at 1:46 PM // reply »
6,516 Comments

@DD,

It has to do with the record set that is available when the WHERE clause executes. When you move more conditions into the ON clause, you have less records (potentially) when the WHERE clause executes.


dd
Feb 26, 2009 at 4:28 PM // reply »
2 Comments

@Ben Nadel,

is ther any change in the number of records in the result set, when we execute in ON and WHERE clause?


Feb 26, 2009 at 5:06 PM // reply »
6,516 Comments

@DD,

No - in the end, the same exact result set should be returned. The only difference is that the version with no WHERE clause might be somewhat more efficient (depending on the query optimizer in the database engine). However, as they return the same thing, go with which ever is more comfortable.


dd
Feb 26, 2009 at 6:09 PM // reply »
2 Comments

table1 table2

solid tcid msnid solid tcid msnid
3 4 44 4 2 12
3 2 12 5 3 11
4 1 55 6 7 13
5 8 88 3 4 44


Apr 14, 2009 at 4:23 AM // reply »
1 Comments

Stop using INNER JOIN in the FROM clause for your first test.

That is the whole point of what you are trying to test and confirm.

Your FROM clause in the first instance should be:

FROM
web_stats_hit h,
web_stats_url u,
web_stats_session s,
web_stats_user_agent a

... move the ON statements into the WHERE clause. Now you will see a MASSIVE difference in the performance of the queries.

The second query should keep the INNER JOINs in the FROM clause with the ON statements.

The queries as you have them now are practically equivalent.


Jul 16, 2009 at 7:01 PM // reply »
1 Comments

i think the inner join is better. I have working in sql server and oracle and inner join is faster than where in this context. INNER JOIN clause reduce intermediate results,issue of query optimization.


Aug 10, 2009 at 8:12 AM // reply »
1 Comments

@McNicholl,

So what would you recommend in this case which will be faster the one with ON or the one with WHERE?


Sep 10, 2009 at 5:21 AM // reply »
1 Comments

Although performance seems the same, the query with the older SQL 89 syntax is more costly. So it's more about I/O (table access). If you have a fast server, differences can be minor. On a busy or slow server, use the SQL 92 syntax though.


Sep 12, 2009 at 10:20 PM // reply »
6,516 Comments

@Dave,

I like moving the conditions into the ON clause, if for no other reason, it makes me more cognoscente of the result sets I am trying to create. I find it helps keep my more focused, which I hope gives me better queries over-all.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 22, 2009 at 4:30 AM
jQuery Live() Method And Event Bubbling
dasegtezr ... read »
Nov 22, 2009 at 4:03 AM
jQuery Live() Method And Event Bubbling
C_fieri ... read »
Nov 22, 2009 at 1:56 AM
Learning ColdFusion 9: Using CFQuery In CFScript Can Enable SQL Injection Attacks
Why adobe would give you script equivalent of cfquery is beyond me. I love cfquery tag because it helps me wriite clean sql, and get away from the horrible jdbc queries If I wanted to write javali ... read »
Nov 22, 2009 at 1:45 AM
Streaming Text Using ColdFusion's CFContent Tag And The Variable Attribute
The reason you would want to do this is to stream. Ack json/xml files to ria clients I used thus technique before because putting json in response stream causes debugging info to come thru As well a ... read »
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »
Nov 21, 2009 at 4:49 PM
Styling The ColdFusion 8 WriteToBrowser CFImage Output
Great work yet again Ben! Whilst I didn't use this whole code, I copied some of your regex code for a similar problem with the lack of an alt attribute and unescaped ampersands in CFIMAGE for Railo 3 ... read »
Nov 21, 2009 at 1:13 PM
My First ColdFusion Builder Extension - Encrypting And Decrypting CFM / CFC Files
@Ben, Because I am pedantic, I just want to make sure that everyone knows there is absolutely no encryption going on. There is only encoding and obfuscation. The cfencode tool only obfuscates your C ... read »