SQL Query Order of Operations

By Ben Nadel

Published 2006-06-05 in SQL — Comments (35)

Lately, I have been looking into SQL query optimization. We recently installed SeeFusion on our server and I can see where my long running tasks are causing the server to slow down. Turns out, not suprisingly, that the slow pages are very query-intense. Granted, a lot of these pages were pages years ago before I knew what nice code looked like, but the good news it, lots of room for optimization and clean up.

To start out, I thought it would be good to look up the order in which SQL directives get executed as this will change the way I can optimize:

FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause

This order holds some very interesting pros/cons:

FROM Clause

Since this clause executes first, it is our first opportunity to narrow down possible record set sizes. This is why I put as many of my ON rules (for joins) as possible in this area as opposed to in the WHERE clause:

FROM
	contact c
INNER JOIN
	display_status d
ON
	(
			c.display_status_id = d.id
		AND
			d.is_active = 1
		AND
			d.is_viewable = 1
	)

This way, by the time we get to the WHERE clause, we will have already excluded rows where is_active and is_viewable do not equal 1.

WHERE Clause

With the WHERE clause coming second, it becomes obvious why so many people get confused as to why their SELECT columns are not referencable in the WHERE clause. If you create a column in the SELECT directive:

SELECT
	( 'foo' ) AS bar

It will not be available in the WHERE clause because the SELECT clause has not even been executed at the time the WHERE clause is being run.

ORDER BY Clause

It might confuse people that their calculated SELECT columns (see above) are not available in the WHERE clause, but they ARE available in the ORDER BY clause, but this makes perfect sense. Because the SELECT clause executed right before hand, everything from the SELECT should be available at the time of ORDER BY execution.

I am sure there are other implications based on the SQL clause order of operations, but these are the most obvious to me and can help people really figure out where to tweak their code.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/70

Reader Comments

Tim McCormack Mar 8, 2008 at 11:58 AM

1 Comments

I suppose an actual RDBMS (MySQL, Oracle, etc.) will perform optimizations out of this order. For example, MySQL probably looks for things like "WHERE a.b=c.d" and moves that filtering up into the FROM processing.

Greg Bulmash Nov 12, 2008 at 5:47 PM

1 Comments

Good job on this entry, Ben. I actually have it bookmarked.

I was looking at it again as I prepared the schema for a new database and started to wonder about optimizing the "where" clause of an inner join.

If you read most tutorials on inner joins, they'll use some syntax like "select * FROM flintstones, bedrock WHERE flintstones.name = bedrock.resident and bedrock.gender = 'male'".

If we imagine that these are both large tables with thousands of members each, I'd imagine that the inner join can be optimized for faster execution and less memory usage since we only want data from "flintstones" when it corresponds with a record from "bedrock" that meets our criteria.

But what's the best way to optimize that query for maximum performance / minimum resource usage?

I have two ideas.

1: Put the "flintstones.name = bedrock.resident" at the end of the WHERE clause, so the cross-matching isn't done until the result set has already been narrowed by the "bedrock.gender = 'male'" part. Problem is that I haven't been able to find anything on the execution order of subcomponents of a WHERE clause to know if this might make a difference.

2: Reformat the query to "SELECT * from (SELECT * from bedrock WHERE gender = 'male') as bedrock, flintstones WHERE flintstones.name = bedrock.resident". It seems that this would constrain the "bedrock" dataset prior to the join, resulting in a faster query.

Any thoughts?

Ben Nadel Nov 12, 2008 at 5:57 PM

16,233 Comments

@Greg,

That's an interesting thought to make the intermediary result set. I wonder how that interacts with table indexing. Unfortunately, I don't really know enough about database execution to think at this level. What I have found is that good indexes almost make more of a difference that some of the low-level optimizations.

I'm still trying to find the sweet spot :)

Rufus Jul 12, 2009 at 1:24 PM

1 Comments

Thanks for the info, i've bookmarked this page.

Michael Scudiero Jul 16, 2009 at 9:28 PM

1 Comments

I appreciate this page. This is exactly the information I was looking for when I googled this afternoon. I, too, appreciate it.

@Greg -- got a kick out of this, Ben, having just started tweeting recently...anyway...

I would write the join as follows, and especially after looking at this order of operations post...

SELECT *
FROM [Flintstones] f
JOIN [Bedrock] b
ON f.[Name] = b.[Resident]
AND b.[Gender] = 'male'

...I'd explicitly list columns, have singular table names and prefix everything with schema, and key the gender, but beyond that the join should as shown should be fastest and ultimately look and feel the "cleanest" once you start writing this way.

I've been structuring joins in much this manner, but was about to write one this afternoon and wanted to double check the order of operation with an eye to the performance implication of the particular statement I was working, hence this search. I have opted to include anything relative among the entities being joined and typically isolated key conditional "business" checks in the where clause for readability. Now, in a couple of key procs, I will be moving those checks into the first join.

The lightbulb came on this afternoon, thanks again, Ben. I thank the Lord I have been as close to the mark as I've been already. It does make sense when you visualize SQL Server "building out" the recordset...the innermost join should have the biggest performance impact. So for industrial strength code, this should pay nice dividends.

Thank you again, I hope someone finds this comment interesting if not helpful.

Peace to all.

Michael

Ben Nadel Jul 18, 2009 at 1:27 PM

16,233 Comments

@Michael,

Glad to help out my man. From what I have been told, the SQL server should do some of this optimization for you after it parses the SQL. But, I like to be as explicit as possible in my thinking.

Kevin Aug 12, 2009 at 9:55 AM

1 Comments

sir I would like to know what is good to study? sql, web designing or programming there in new york? cuz I have a plan in staying there when I graduate in college!

Ben Nadel Sep 6, 2009 at 2:11 PM

16,233 Comments

@Kevin,

These are three overlapping skills - ideally you should know *something* about design; but also, web programming and SQL go hand in hand.

Hassan Adam Jan 14, 2010 at 2:19 PM

1 Comments

Thanks for the hint, certainly it helps

Rahul Gupta Apr 14, 2010 at 1:18 AM

1 Comments

Hi Ben

This post was really helpful. It answered quite a few of my questions regarding query optimization. And also cleared many concepts that why they were so.

Regards
Rahul

Ben Nadel Apr 15, 2010 at 9:54 PM

16,233 Comments

@Hassan, @Rahul,

Glad to help out guys. I thnk understanding the order really help you think more effectively about SQL queries.

Pulkit Ojha Apr 28, 2010 at 4:00 PM

1 Comments

It is a very simple and nicely explained article. Was really useful.

Way to go BEN !!!

Ben Nadel May 16, 2010 at 10:52 PM

16,233 Comments

@Pulkit,

Thanks my man.

tehila Jul 28, 2010 at 5:49 AM

1 Comments

hi, this is exactly what i needed, nicely written

thankyou

Laxman Aug 12, 2010 at 7:38 PM

1 Comments

I was executing a query and puzzled with the execution of the query. Then I googled for the order of SQL directives and found you post.

It is really good.
Thanks

Ben Nadel Aug 14, 2010 at 11:36 AM

16,233 Comments

@Tehila, @Laxman,

Glad you guys enjoyed this post. I hope it was helpful.

Skip Nov 8, 2010 at 2:41 PM

2 Comments

Great post! Found it by googling "mysql order of execution". It confirmed my assumption that the SELECT clause would execute one of the last, and now I'm absolutely clueless about the following:
I have a rather complex subquery in the SELECT portion, expecting that the main query will return only a limited number of results. BUT the funny thing is, even if there are 0 results returned, adding this subquery into the SELECT makes the script run 1000slower. How's that possible? If SELECT is executed after the WHERE and WHERE returned 0 results, the SELECT should not even execute, right? Or am I missing someting? ...and if it does not execute why does it slow down the query? Any ideas?

Philip Bedi Nov 9, 2010 at 12:08 PM

18 Comments

Hi Ben,

If you had DISTINCT in your select query where would that go in the list, in my guess this should go above Order By clause?

What do you say?

Ben Nadel Nov 10, 2010 at 10:00 AM

16,233 Comments

@Skip,

Hmm, that's a good question. Perhaps the SQL engine is reworking the execution flow for some sort of optimization? I don't know how SQL engines determine their most optimized work flows. What kind of database are you using?

@Philip,

I believe that DISTINCT executes as part of the SELECT statement. But, I'm not entire sure on that - it's just a guess.

Skip Nov 11, 2010 at 3:20 PM

2 Comments

@Ben,

Hi Ben, thanks for your comment. It's mysql database with MyISAM tables. Don't know if it matters...
I ended up just re-doing the query altogether. Works now, that's what counts at the end!

Richard Brasier Nov 11, 2010 at 7:20 PM

13 Comments

@Ben,

Cool - thanks for this info, handy to know.

Just thought that something might be of interest.
I recently attended an Oracle DBA course, and the instructor mentioned that in the newer versions of Oracle, it shouldnt matter how you throw your SQL statements, the database should tune itself to perform the statement in the most effecient way.
SQL tuning in this case was almost always at the database end, rather than on the application side (where the SQL was written).
I havent looked that much into SQL server or mysql to see if they have a similar type of process that the database performs.

On a completely unrelated note, its a real shame that Oracle is becoming more and more the big bad wolf of the Software/I.T. world.

Ben Nadel Nov 13, 2010 at 11:52 AM

16,233 Comments

@Skip,

It's interesting that you mention that - I was not aware that there were different types of MySQL out there until this weekend. Bob Silverberg said there's a MyISAM and an InnoDB version and that you should pretty much always use InnoDB. Of course, he was talking about this in the context of ColdFusion's new ORM settings - I'm not sure how that applies to anything else.

But, as far as flavor-specific caveats, I am not in the know about this.

@Richard,

I've definitely heard that kind of idea from a number of people - that the Database will optimize the execution regardless of how you order your operations. I really wish I knew more about how databases actually work.

As far as the Oracle thing, people freak out any time they have to pay for software.... the irony of which is that they then turn around a charge their clients :D

Alan Smith Jan 20, 2011 at 5:21 AM

1 Comments

I was searching for info on how SQL Server engine prioritises Logical operations, AND + OR, as in BODMAS for arithmetic operations. (Brackets, Order(powers), Division, Multiplication,Addition,Subtraction),
If I write A AND B OR C it appears to evaluate A AND B then ORs the result with C. If I write A AND (B OR C) that forces the OR to evaluate first then AND with A. However if the number of terms is extended greater than 3 A,B,C,D etc it becomes a far more unpredictable scenario. Take A OR B AND C OR D, I expected that to execute B AND C ORed with A then ORed with D but it only seems to give me that result if I explictly insert brackets like
A OR (B AND C) OR D. Can anyone add any experience of this to the discussion.

Further if you go in and out of the Query builder the expression come back out as
(A OR B) AND (A OR C) OR D ????? what is that all about. I am a didital electronics engineer by degree and this is the reverse principle of the distributive laws and extends rather than simplifies the expression.

Many thanks

Alan

Zach Stagers Feb 19, 2011 at 5:02 PM

1 Comments

Nice post, and obviously a very useful one to a lot of people - posted in June 5, 2006 and people (like myself) are still commenting! :)

I'm going to be doing a lot of optimization over the coming months, and this information is going to prove useful.

Thanks,

Zach

Atul Yadav May 23, 2011 at 12:59 AM

1 Comments

Hi Ben,

This post is really helpful for me. many time i read your blog and found lots of things which is to be learn and helpful in my work.

i need your favour, Would you please tell me the Query Execution Architecture in SQL Server.

Well, Thanks in Advance...:)

Regards
Atul

WebManWalking May 23, 2011 at 1:27 PM

290 Comments

@All, hope this helps:

http://en.wikipedia.org/wiki/Query_plan

SQL_Guru Jul 1, 2011 at 2:50 PM

1 Comments

Hi.

The optimizer will execute the query according to the lowest cost execution path regardless of the way you write the query so it doesn't really matter about this order of precedence...

Orion Jun 29, 2012 at 11:39 AM

1 Comments

Hi Ben,

First of all, thank you for this post. It is of great help to me.

But I found a wired thing here. As you said, SELECT columns are not referencable in the WHERE clause. However, they are referencable in the GROUP BY clause in my test. Like I renamed some columns and refer to them in group by clause. It did work.

My question here is: is this related to some sort of optimization? Does select have priority to group by, or the columns in group by clause had been executed twice, first in group by and then in select?

Thanks a lot

prashant mhatre Oct 16, 2012 at 4:03 PM

1 Comments

@Ben,
really a good information...:)
thanks a tone...
but i am still confused in the execution order between from and where??
i heard from someone before reading this is that execution start from where clause --> from clause.
now in a dual state of mind..:(
please help me ..

Daniel Dec 27, 2012 at 4:35 PM

1 Comments

Great post, thanks for the info! Googled "SQL order of operations" and this post from years ago was still #1.

@prashant,

Think about the query this way. How can you know the WHERE data that qualifies the statement without knowing the location FROM which the data is being retrieved?

You know you want an orange, but you have to go to the store where it comes FROM first :)

I know its a very basic explanation but according to the SQL "order of operations" that is how it works.

Of course there are many things now in DBMs that optimize the query for you so the order listed on this page may or may not be used exactly as stated anymore.

Sekhar Reddy Aug 24, 2013 at 12:17 PM

1 Comments

Hi Ben,

Thanks for this post.

Example Query:

Select * from Dept where DeptNo in (Select DeptNo from Emp)

I would like know how the above query will execute.

As of my knowledge, The above query will execute in the following Order:
1. Main Query From Clause
2. main Query Where Clause
3. Sub Query From Clause
4. Sub Query Where Clause
5. Sub Query Select
6. Main Query Select

Please Let me know my thought process is correct or not.

Thanks in advance.

Regards,
Sekhar Reddy