SQL Query Order of Operations

Posted June 5, 2006 at 1:35 PM

Tags: SQL

Lately, I have been looking into SQL query optimization. We recently installed SeeFusion on our server and I can see where my long running tasks are causing the server to slow down. Turns out, not suprisingly, that the slow pages are very query-intense. Granted, a lot of these pages were pages years ago before I knew what nice code looked like, but the good news it, lots of room for optimization and clean up.

To start out, I thought it would be good to look up the order in which SQL directives get executed as this will change the way I can optimize:

  1. FROM clause
  2. WHERE clause
  3. GROUP BY clause
  4. HAVING clause
  5. SELECT clause
  6. ORDER BY clause

This order holds some very interesting pros/cons:

FROM Clause

Since this clause executes first, it is our first opportunity to narrow down possible record set sizes. This is why I put as many of my ON rules (for joins) as possible in this area as opposed to in the WHERE clause:

 Launch code in new window » Download code as text file »

  • FROM
  • contact c
  • INNER JOIN
  • display_status d
  • ON
  • (
  • c.display_status_id = d.id
  • AND
  • d.is_active = 1
  • AND
  • d.is_viewable = 1
  • )

This way, by the time we get to the WHERE clause, we will have already excluded rows where is_active and is_viewable do not equal 1.

WHERE Clause

With the WHERE clause coming second, it becomes obvious why so many people get confused as to why their SELECT columns are not referencable in the WHERE clause. If you create a column in the SELECT directive:

 Launch code in new window » Download code as text file »

  • SELECT
  • ( 'foo' ) AS bar

It will not be available in the WHERE clause because the SELECT clause has not even been executed at the time the WHERE clause is being run.

ORDER BY Clause

It might confuse people that their calculated SELECT columns (see above) are not available in the WHERE clause, but they ARE available in the ORDER BY clause, but this makes perfect sense. Because the SELECT clause executed right before hand, everything from the SELECT should be available at the time of ORDER BY execution.

I am sure there are other implications based on the SQL clause order of operations, but these are the most obvious to me and can help people really figure out where to tweak their code.

Download Code Snippet ZIP File

Comments (3)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Reader Comments

I suppose an actual RDBMS (MySQL, Oracle, etc.) will perform optimizations out of this order. For example, MySQL probably looks for things like "WHERE a.b=c.d" and moves that filtering up into the FROM processing.

Posted by Tim McCormack on Mar 8, 2008 at 11:58 AM


Good job on this entry, Ben. I actually have it bookmarked.

I was looking at it again as I prepared the schema for a new database and started to wonder about optimizing the "where" clause of an inner join.

If you read most tutorials on inner joins, they'll use some syntax like "select * FROM flintstones, bedrock WHERE flintstones.name = bedrock.resident and bedrock.gender = 'male'".

If we imagine that these are both large tables with thousands of members each, I'd imagine that the inner join can be optimized for faster execution and less memory usage since we only want data from "flintstones" when it corresponds with a record from "bedrock" that meets our criteria.

But what's the best way to optimize that query for maximum performance / minimum resource usage?

I have two ideas.

1: Put the "flintstones.name = bedrock.resident" at the end of the WHERE clause, so the cross-matching isn't done until the result set has already been narrowed by the "bedrock.gender = 'male'" part. Problem is that I haven't been able to find anything on the execution order of subcomponents of a WHERE clause to know if this might make a difference.

2: Reformat the query to "SELECT * from (SELECT * from bedrock WHERE gender = 'male') as bedrock, flintstones WHERE flintstones.name = bedrock.resident". It seems that this would constrain the "bedrock" dataset prior to the join, resulting in a faster query.

Any thoughts?

Posted by Greg Bulmash on Nov 12, 2008 at 5:47 PM


@Greg,

That's an interesting thought to make the intermediary result set. I wonder how that interacts with table indexing. Unfortunately, I don't really know enough about database execution to think at this level. What I have found is that good indexes almost make more of a difference that some of the low-level optimizations.

I'm still trying to find the sweet spot :)

Posted by Ben Nadel on Nov 12, 2008 at 5:57 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting