SQL Optimization And ON Clause vs WHERE Clause

Posted September 21, 2006 at 6:18 PM by Ben Nadel

Tags: ColdFusion, SQL

I have been under the impression that putting "criteria" in the ON clauses of a SQL statement will create a faster query when compared to the same SELECT statement if the same criteria was placed in the WHERE clause. The ON clause is executed before the WHERE clause so I figured criteria there would perform better. When you join two tables together it creates a cross-product then it applies the WHERE clause to the resultant table join. At least this is what I think happens.

Anyway, that is a horrible explanation, but I went ahead and did some tests to see what kind of speed differences there are. To test, I queried a web statistics database, joining several tables together.

Here is the query with all the criteria in the WHERE clause:

  • <cfquery name="qWhereTest" datasource="...">
  • SELECT
  • COUNT( * )
  • FROM
  • web_stats_hit h
  • INNER JOIN
  • web_stats_url u
  • ON
  • h.web_stats_url_id = u.id
  • INNER JOIN
  • web_stats_session s
  • ON
  • h.web_stats_session_id = s.id
  • INNER JOIN
  • web_stats_user_agent a
  • ON
  • s.web_stats_user_agent_id = a.id
  • WHERE
  • <!--- Get urls that were in blogs or snippets. --->
  • (
  • u.name LIKE
  • <cfqueryparam
  • value="%blog%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • OR
  • u.name LIKE
  • <cfqueryparam
  • value="%snippets%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  •  
  • <!---
  • Get urls what were in the DB. This has no purpose
  • other than to make the DB work harder.
  • --->
  • AND
  • u.id BETWEEN
  • (
  • SELECT
  • MIN( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • AND
  • (
  • SELECT
  • MAX( u2.id )
  • FROM
  • web_stats_url u2
  • )
  •  
  • <!--- Get sessions started from september. --->
  • AND
  • s.date_created BETWEEN '2006-09-01' AND '2006-10-01'
  •  
  • <!--- Get sessions user Mozilla browsers. --->
  • AND
  • a.name LIKE
  • <cfqueryparam
  • value="%Mozilla%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • </cfquery>

Then, in this test, I have moved all the WHERE clause criteria to be part of the ON clauses:

  • <cfquery name="qOnTest" datasource="...">
  • SELECT
  • COUNT( * )
  • FROM
  • web_stats_hit h
  • INNER JOIN
  • web_stats_url u
  • ON
  • (
  • h.web_stats_url_id = u.id
  •  
  • <!--- Get urls that were in blogs or snippets. --->
  • AND
  • (
  • u.name LIKE
  • <cfqueryparam
  • value="%blog%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • OR
  • u.name LIKE
  • <cfqueryparam
  • value="%snippets%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  •  
  • <!---
  • Get urls what were in the DB. This has no purpose
  • other than to make the DB work harder.
  • --->
  • AND
  • u.id BETWEEN
  • (
  • SELECT
  • MIN( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • AND
  • (
  • SELECT
  • MAX( u2.id )
  • FROM
  • web_stats_url u2
  • )
  • )
  • INNER JOIN
  • web_stats_session s
  • ON
  • (
  • h.web_stats_session_id = s.id
  •  
  • <!--- Get sessions started from september. --->
  • AND
  • s.date_created BETWEEN '2006-09-01' AND '2006-10-01'
  • )
  • INNER JOIN
  • web_stats_user_agent a
  • ON
  • (
  • s.web_stats_user_agent_id = a.id
  •  
  • <!--- Get sessions user Mozilla browsers. --->
  • AND
  • a.name LIKE
  • <cfqueryparam
  • value="%Mozilla%"
  • cfsqltype="CF_SQL_VARCHAR"
  • />
  • )
  • </cfquery>

In the second test, there is no WHERE clause at all. It is not needed. I thought this would show noticeable performance gains. To my surprise, it did not. Both queries operated at the same speed. In fact, the WHERE clause test was generally a few milliseconds faster! This shocked me a bit. Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution. Or, maybe my test query is not a good example. Perhaps there are situations where one is much faster than the other. Anyone have any feedback on that sort of thing?

And, this is no small set of data. The "hits" table as over 52,000 records in it.

So, I guess it comes down to which one is more readable / maintainable for you. The WHERE clause looks a bit cleaner I suppose.



Reader Comments

Sep 21, 2006 at 9:18 PM // reply »
153 Comments

You said: "Either my beliefs are wrong or the SQL Server does a wonderful job of optimizing the statements prior to execution."

How about both? ;-)

SQL Server has a pretty decent optimizer. I would guess that it is probably compiling both queries down to the exact same execution plan. However, not all DBMSes have anything where near what I would call "decent". We have an AS/400 here running DB2 v5 and its optimizer is significantly less intelligent. In fact, not an hour ago I moved a boolean expression from the WHERE clause to an ON clause and sped up my query from 10 minutes to 3 minutes.

And yet, when we were on DB2 v4 last year, things were even worse. (No concept of a LIMIT/TOP function, if that gives you any idea.)

Working with the primary-school-level DB2 (at least, the versions I have experience with) has taught me soooo much about query optimization and the real ins-and-outs of SQL. I realize that even just 2 years ago I could write queries that worked, but probably not how I thought they worked, and certainly not as well as they could have worked.


Sep 22, 2006 at 7:33 AM // reply »
11,241 Comments

Rick,

So, I am not sure that I understand you. Are you saying that the ON clause is faster than the WHERE clause (given your example), or are you saying that my assumption about it is false?


Sep 22, 2006 at 10:54 AM // reply »
153 Comments

Sorry, I did ramble a bit, eh?

The ON clause can be faster than the WHERE clause, but isn't necessarily always so. As you've seen, the MS SQL optimizer is pretty smart, but I was suggesting that you try a few more DBMSes and see how they handle moving statements around.

Different DBMSes are going to respond differently to different "optimizations". You would think, SQL having been a fairly stable standard for decades at this point, that the same rearrangement of SQL would produce the same results, but it won't. I was just using DB2 (sorry, UDB, whatever) as an example, as it always seems to me to have the least in common with MS SQL.


Sep 22, 2006 at 11:03 AM // reply »
11,241 Comments

Rick,

I figured as much (at what you were driving at). Right now I work in a fairly MS SQL server-centric universe, but I can understand that every DB is gonna have it's own optmization routines. I guess that is why some are free and some are expensive ;)


dd
Feb 26, 2009 at 1:25 PM // reply »
1 Comments

Can you let me know the difference between the following 2 sql's

Sel acct_id ,
?.
...
From <table1> a join <table2> b
on a.sol_id = b.sol_id
and a.tc_id = b.tc_id
and a.msn_num = b.msn_num
______________________________________s

Sel acct_id ,
?.
...
From <table1> a join <table2> b
on a.sol_id = b.sol_id
where a.tc_id = b.tc_id
and a.msn_num = b.msn_num


Feb 26, 2009 at 1:46 PM // reply »
11,241 Comments

@DD,

It has to do with the record set that is available when the WHERE clause executes. When you move more conditions into the ON clause, you have less records (potentially) when the WHERE clause executes.


dd
Feb 26, 2009 at 4:28 PM // reply »
2 Comments

@Ben Nadel,

is ther any change in the number of records in the result set, when we execute in ON and WHERE clause?


Feb 26, 2009 at 5:06 PM // reply »
11,241 Comments

@DD,

No - in the end, the same exact result set should be returned. The only difference is that the version with no WHERE clause might be somewhat more efficient (depending on the query optimizer in the database engine). However, as they return the same thing, go with which ever is more comfortable.


dd
Feb 26, 2009 at 6:09 PM // reply »
2 Comments

table1 table2

solid tcid msnid solid tcid msnid
3 4 44 4 2 12
3 2 12 5 3 11
4 1 55 6 7 13
5 8 88 3 4 44


Apr 14, 2009 at 4:23 AM // reply »
1 Comments

Stop using INNER JOIN in the FROM clause for your first test.

That is the whole point of what you are trying to test and confirm.

Your FROM clause in the first instance should be:

FROM
web_stats_hit h,
web_stats_url u,
web_stats_session s,
web_stats_user_agent a

... move the ON statements into the WHERE clause. Now you will see a MASSIVE difference in the performance of the queries.

The second query should keep the INNER JOINs in the FROM clause with the ON statements.

The queries as you have them now are practically equivalent.


Jul 16, 2009 at 7:01 PM // reply »
1 Comments

i think the inner join is better. I have working in sql server and oracle and inner join is faster than where in this context. INNER JOIN clause reduce intermediate results,issue of query optimization.


Aug 10, 2009 at 8:12 AM // reply »
1 Comments

@McNicholl,

So what would you recommend in this case which will be faster the one with ON or the one with WHERE?


Sep 10, 2009 at 5:21 AM // reply »
1 Comments

Although performance seems the same, the query with the older SQL 89 syntax is more costly. So it's more about I/O (table access). If you have a fast server, differences can be minor. On a busy or slow server, use the SQL 92 syntax though.


Sep 12, 2009 at 10:20 PM // reply »
11,241 Comments

@Dave,

I like moving the conditions into the ON clause, if for no other reason, it makes me more cognoscente of the result sets I am trying to create. I find it helps keep my more focused, which I hope gives me better queries over-all.


May 17, 2010 at 5:10 PM // reply »
1 Comments

My applogies if there is a better site for this, but I am stumped & Google-the-great led me here. Not to get off the specific example here, but the query of mine below seems to take rediculously long (over 10 minutes so far). Am I doing something wrong?

SET NOCOUNT ON
DECLARE @StartDate DateTime
DECLARE @EndDate DateTime
SET @StartDate = convert (varchar, getdate() - 3, 23)
SET @EndDate = convert(varchar, @StartDate + 1, 23)
SET NOCOUNT OFF

SELECT
pt.TagName, Datetime = Convert (varchar, DateTime, 20), Value
FROM
t_mod_pretrip_tags pt
INNER JOIN
v_DiscreteHistory dh
ON
pt.TagName = dh.TagName
/*INNER JOIN
QualityMap qm
ON
dh.QualityMap = qm.QualityMap*/

WHERE Value = 1
AND DateTime >= @StartDate
AND DateTime < @EndDate


May 17, 2010 at 5:18 PM // reply »
11,241 Comments

@Jeff,

A query running that long can probably be most helped with better table indexes. It is unlikely that ON vs. WHERE join conditions is causing such a performance problem.


Jul 20, 2010 at 3:58 PM // reply »
1 Comments

Recently I took a query running 7 minutes down to 2 minutes by moving filter items in the WHERE clause to their corresponding ON clauses. In my case, there are 6 inner joins with 2 of them being to derived tables. Personally I prefer to keep JOIN logic in the ON clause and filter logic the WHERE clause, but in this case, I had to move filter criteria to the ON clause for better performance.


Jul 20, 2010 at 8:38 PM // reply »
11,241 Comments

@Benjamin,

It's cool to hear that you found such a large improvement of speed with movement of filtering. I think it for the most part it comes down to personal preference. For me, once I started using the ON clause, I felt like it was just nicer.


Aug 3, 2010 at 4:56 AM // reply »
1 Comments

Its taking couple of hours to execute the below query. Both tables have millions of entries and index is on col1, col3 and col4 of tableB.
What can I do to make it run fast?

update tableA,tableB
set tableA.col2 = 1
where tableA.col1 = tableB.col1 and (tableA.col3 < 100 or tableA.col4 >200);


Aug 3, 2010 at 9:41 PM // reply »
11,241 Comments

@Akansha,

I am not sure what to suggest. You'll probably want to talk to someone with more in-depth SQL execution knowledge.


Aug 25, 2010 at 10:48 AM // reply »
1 Comments

I personally put any filters on the joined tables in the ON instead of the WHERE, that way I can change from INNER to OUTER joins without needing to adjust the rest of the code.


Aug 25, 2010 at 10:58 PM // reply »
11,241 Comments

@Nick,

Oh that's interesting - I had never considered switching JOIN types before. Personally, I just like to put in the ON clause because that's the point where I think about the relationship between tables... and, most of the time, it also works out for performance.


Oct 6, 2010 at 8:32 PM // reply »
14 Comments

I think the speed depends on what you are trying to compare. If it's simple comparison like 'age < 25' then I dont think it makes much difference between WHERE and ON but when those comparisons require complicated calculations like resulting in a subquery, it does matter.

I'm pulling this out of my experience here, but WHERE is executed/compared for every record while ON is done just once. So if the comparison is slow, that will be multiplied by the number of records you are trying to compare against.

For example, the following 2 are significantly different:

1)
SELECT ...
FROM A INNER JOIN B ON ...
WHERE
age <
(
--- this subquery is run for every row ---
SELECT MAX(age)
FROM C
)

2)
SELECT ...
FROM A INNER JOIN B ON ...
INNER JOIN
(
--- this subquery is run just once ---
SELECT MAX(age) AS maxAge
FROM C
) C ON age < C.maxAge

They are not real examples but I hope you get the idea. The key difference is in the first case, the subquery will be run against every single row while in the second case, it is run just once.

It happened in my work codes before and it used to hand almost completely until I changed them to use ON.

With the impression that WHERE is run for every row while ON is run once, I try to use ON more now. But I agree there are times WHERE is more readable, especially when we need to join multiple tables.


Oct 10, 2010 at 4:26 PM // reply »
11,241 Comments

@Khoa,

I have to say that I love to JOIN to intermediary tables, as you are doing in the latter example. I have seen some pretty awesome performance gains from these types of transient tables since, as you are saying, they only need to be calculated once, rather than for each row.

... plus, I think they are just cool too :)

However, as far as ON vs. WHERE in terms of the number of comparisons, I think that each of them will have to be executed on each row. Your example is a little different in that your transient table derived by the inner SELECT only results in one row. However, the actual JOIN comparison:

age < c.maxAge

... is still going to be run on every row in table B.

That said, derived tables are awesome!


Oct 10, 2010 at 5:21 PM // reply »
14 Comments

@Ben,

I think my point is more about the number of times the subquery is executed which is where the performance gain is. And you are right the final comparison still needs to be run for each row.

Same here. I also love using ON as I think it makes the query more organised and beautiful :-)


Oct 10, 2010 at 9:29 PM // reply »
11,241 Comments

@Khoa,

100%, I think we're on the same page. And, I know this might sounds ultra geeky, but when I can create a derived table and then JOIN to it... it actually makes me smile :)


Apr 1, 2011 at 6:03 AM // reply »
3 Comments

Hi Ben,
I read all the above comments, hot topic to discuss and thatswhy I looked into google for more facts about this, I got some interesting info about LEFT JOIN... as we should take care while we use all conditions on ON clause. see the below link for details.

http://www.gplivna.eu/papers/sql_join_types.htm#p6.4

hope this will help you...
Thanks for the post...actually I came to your blog post from google search.
Thanks for the contribution...


Jun 2, 2011 at 3:37 PM // reply »
1 Comments

Did not have time to read all the responses, so not sure if this was added. In SQL Server the query engine always executes where clauses first, so don't think it would make much diff. in an on clause. However, if you use a sub query with a where clause to substiute for the table with the where filter, then this executes before the joins, which reduces redundant joins, i.e. less records to join. This has a dramatic boost on performance!!


Jun 3, 2011 at 11:02 PM // reply »
11,241 Comments

@ACFFan,

Good link! I've been bitten by that problem before. In general, I try to stick with INNER JOINs as much as possible; but, when I do have to use a LEFT OUTER JOIN, your advice is right on the money!

@Ron,

I haven't used MS SQL in a while (at my current company we use MySQL); but, that's really interesting to hear about the WHERE clause's relative execution in the order of operations. I had no idea. This is perhaps a recent update (last couple of years) to the engine?


Jun 6, 2011 at 2:05 AM // reply »
3 Comments

Hi Ben,

Very Glad To see the above link helped you...
Thanks...


Jul 3, 2011 at 10:54 AM // reply »
3 Comments

What would you suggest to speed up this query?
I plan to put the optimized version in a stored procedure. What can I do to the query below to optimize it so that I don't get query timeout errors over our network? Thanks for your help.

select count(*) As SCount from SM_Sales_Master where (SM_Sales_Date >= @BeginDate And SM_Sales_Date <= @EndDate) And SM_Sales_Region = 'A'

select Top 50 SCount, SM_Rep_Name, SM_Rep_ID, ST_Item_ID, ST_Item_Desc from
SM_Sales_Master INNER JOIN ST_Sales_Type ON
SM_Rep_ID = ST_Rep_ID
Where (SM_Sales_Date >= @BeginDate And SM_Sales_Date <= @EndDate) And SM_Sales_Region = 'A' And ST_Level = '1'


Jul 3, 2011 at 10:56 AM // reply »
3 Comments

Type above query is in MS SQL 2005. I forgot to add that. Thanks for your help.


Jul 3, 2011 at 10:57 AM // reply »
3 Comments

The above query is in MS SQL 2005. I forgot to add that. Thanks for your help. Sorry for the typeo above.


Sep 2, 2011 at 10:13 AM // reply »
1 Comments

Writing the code with 4 Equi-Joins followed immediately by only the WHERE clause statements applicable to the current Equi-Join was palpably
slower than the optimized alternative: Creating 4 INNER JOINS, eliminating most of the WHERE clauses and listing the remaining ones at the bottom of the INNER JOIN statements. Creating the equivalent of one long extended JOINED row with the minimum number of WHERE clauses has a remarkable and blatantly visible simplifying effect on the Query Plan. It went from unintelligible garbage to an overtly easy to read and simple to follow truncated and compressed query plan. Speed improvement was only about 15-20% but the resource consumption spikes no longer occurred for RAM, CPU or Disk I/Os which is even more critical. How someone else could report an almost identical query plan after making these types of changes is beyond me. And any query with LIKE clauses is doomed to slowdown performance regardless of what else you do to optimize the remainder of the code. Clean INNER JOINS with minimized WHERE clauses and the addition of Indexes on every column on either the LEFT or the RIGHT half of the INNER JOIN are the real secrets to performance improvements.


Sep 2, 2011 at 11:17 AM // reply »
369 Comments

Very interesting read. I have just recently started working a whole lot more with databases. I will have to bookmark this for a later in-depth read of all of the comments. Too busy with work now. :-/


Sep 28, 2011 at 10:48 AM // reply »
1 Comments

In my opinion you should test running the query right into the database rather than bringing into the picture another factor like a web application, it can be misleading if you ask me. The web application for simple it could be it can introduce complexity to the final result.


Sep 28, 2011 at 11:05 AM // reply »
369 Comments

@Vasco, I agree that at least some testing should be done running the query right into the database, especially if you have highly complex queries. The current project I am working has a very complex combination of computations being done both at the database level in the queries, and within the programming language itself. I know it is ideal to take most mathematical calculations to the database, but there are times when it just works better to do some of them in the code. There's also the issue of working to modify code that has already been written and being given the time and/or authority to re-write the whole thing from scratch. And I have found as well that when running queries and testing on the database level only...that is, just running them in the database, there are also other factors as well that comes into play. I guess other users affect execution time, as well as how busy the server is (which also ties into other users). That isn't my specialty, so I'm not sure what other factors there are, but I do know sometimes there's a huge difference between the tests I run on the test database and what happens on live. And I don't always have the permissions to run test queries on the live database...


Oct 29, 2011 at 10:03 PM // reply »
1 Comments

Where clause is cleaner? OMG dude it's quite the opposite. I just refactored a horrible long query that I could not read because it was a bunch of where clauses at the end instead of nice clean join syntax.

How do you figure where is cleaner....dude common, get some sense.


Oct 30, 2011 at 10:51 AM // reply »
11,241 Comments

@Dave,

This post is about 5 years old. I can assure you that I now move as much stuff into my ON clauses as possible. I definitely find it easier to read and understand as it limits the tables AS they are joined, rather than at the end.


Dec 23, 2011 at 9:34 AM // reply »
1 Comments

This is best one article so far I have read online. I would like to appreciate you for making it very simple and easy. I have found another nice post related to this post over the internet which also explained very well. For more details you may check it by visiting this url.
http://mindstick.com/Articles/da828000-bb36-4ed5-aee6-ea90596154cd/?SQL%20Clause

Its also helped me lot.
Thanks everyone!!


Mar 23, 2012 at 6:38 AM // reply »
1 Comments

can u please tell me the difference between this 2 stmt.

1.SELECT
o.*
FROM
Customer c
LEFT JOIN
[Order] o ON o.CustomerID = c.CustomerID AND o.OrderType = 'Cash'
WHERE
c.Country = 'USA'

2.SELECT
o.*
FROM
Customer c
LEFT JOIN
[Order] o ON o.CustomerID = c.CustomerID
WHERE
c.Country = 'USA'
AND
o.OrderType = 'Cash'


Oct 11, 2012 at 9:11 AM // reply »
1 Comments

Thanks it was helpful



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 22, 2013 at 7:52 AM
Nested Views, Routing, And Deep Linking With AngularJS
Hi, Just a quick thank you. As it happens, for my own purposes, the pending ui-router work being done in native angular is likely the one I'll adopt, but your exploration, code and documentation of ... read »
May 22, 2013 at 4:43 AM
How Do You Use The ColdFusion CFParam Tag?
'<cfparam>' or 'isDefined()and <cfset>' performs the same task.Is there any difference? ... read »
May 21, 2013 at 7:46 PM
Using Plupload For Drag & Drop File Uploads In ColdFusion
No luck. At least I have uncovered the cause, URLScan 3.1. Here is what I see in the IIS log when a file is over 30mb. 2013-05-21 23:29:05 10.105.45.128 GET /plupload/assets/jquery/jquery-1.8. ... read »
May 21, 2013 at 6:12 PM
Using Plupload For Drag & Drop File Uploads In ColdFusion
Ben, I did not see you after Pete Freitag's Lockdown session at cfObjective but he said that IIS sets file size limits at 30MB by default which just happened to be the threshold for file size when ... read »
May 21, 2013 at 11:51 AM
Ask Ben: Parsing Very Large XML Documents In ColdFusion
Looking at my first ever XML document that I have to parse and put into MS SQL 2000 with CF8. I get it to list the desired Field name, many times over, and have a long list of this field name displa ... read »
May 21, 2013 at 9:25 AM
Turning Off and On Identity Column in SQL Server
you are awesome..i am lucky to get this blog between such a garbage one....Thanks, Prashant ... read »
May 20, 2013 at 4:38 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
@Dana, Your confusion is well founded, since this is a very confusing features. In fact, it ONLY works if you use array notation. Meaning, that this: arrayToList( query[ "columnName" ] ) ... read »
May 20, 2013 at 4:34 PM
Using A Dynamic Column Name With ValueList() In ColdFusion
I was thinking chicken and the egg, I wouldn't have expected it to work in the valuelist going in I guess. Maybe I just need a beer, long day :) ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools