A few months ago, Rick Osborne introduced me to the idea of a SQL database Pivot Table. These pivot tables consist of a table with a single column that has an index that counts up from one (I'm sure there are variations on this) to a set number. For instance the pivot table [pivot100] has an ID column with values from 1 to 100. Rick told me that he has tables that go to 100, 1000, and 10000, but so far I have only rocked the 100 pivot table.
So anyway, this pivot table just cut a HUGE, NASTY, UGLY SQL query from about 90ms down to about 20ms. That is an outstanding performance increase. This query was huge and nasty and for reporting so it did a tone of things, but I just wanted to show you a small part of it. It is for a recycling program that has a pickups table [pickup] and join table from users to pickups [pickup_useraccount_jn]. As PART of this massive query, I had to find the number of users whose net weight of recycling over a certain date period fell into a given 5 pound bucket (ie. 0-5 lbs, 5-10 lbs, etc.) all the way up to 80 (which was 80+ NOT 80-85 lbs).
First, let's look at the original SQL query. You can assume that all the SQL variables have already been defined and that lstAvailableUsers is list of User IDs to be included in the report (I know that is lame... I am overhauling everything):
<!--- Now, we have to loop over the 5 lb increments and find the number of users that recycled a net pickup weight in that sum category. ---> <cfloop index="i" from="0" to="80" step="5"> SET @number_#i#_#(i + 5)# = ISNULL( ( SELECT COUNT(*) FROM ( SELECT SUM(p.net_weight) AS net_weight FROM pickup p WHERE p.date_pickup >= @date_from AND p.date_pickup < @date_to AND p.user_id IN ( #lstAvailableUsers# ) GROUP BY p.user_id ) AS TempWeightTable#i# WHERE net_weight >= #i# <!--- Check for the catch-all weight bucket. ---> <cfif (i LT 80)> AND net_weight < #(i + 5)# </cfif> ), <!--- If above is null, return zero. ---> 0 ); </cfloop>
As you can see, I am basically running that inner query 16 times stepping from 0 to 80. That is a SLOW query to be running so many times. That's where the pivot table comes into play. Well, not just the pivot table, but a pivot table and temporary table variable. To clean up the query, I start out be creating a table variable for the steps (0-5, 5-10, etc):
DECLARE @number_step_5 TABLE ( step INT, user_count INT );
Then, I go about populating that table using the help of the SQL pivot table:
<!--- Set up the step-5 table. This will step from 0 to 80 getting counts of users who recycled a sum weight in that 5 lb weight bucket. ---> INSERT INTO @number_step_5 ( step, user_count )( SELECT tpiv.id AS step, COUNT( * ) FROM <!--- Take the pivot100 talbe and trim it so that we are only returning pivot IDs that are a 5lb increment. Also, we need to substract one from our 1-based ID column as our span weibht buckets start at zero. ---> ( SELECT (piv.id - 1) AS id FROM pivot100 piv WHERE piv.id < 82 AND ((piv.id - 1) % 5) = 0 ) AS tpiv INNER JOIN <!--- Create an in-memory temp table that has all the different net recycling weight sums by user. We don't care about the user at this point, just the total list of sums. ---> ( SELECT SUM( p.net ) AS net_sum FROM pickup p INNER JOIN pickup_useraccount_jn pujn ON ( p.id = pujn.pickup_id AND p.date_picked_up BETWEEN @date_from AND @date_to AND pujn.useraccount_id IN ( #lstAvailableUsers# ) ) GROUP BY pujn.useraccount_id ) AS tsum ON <!--- When joining the sum table to the pivot table, we want the sum to be between the intial pivot value (our start value) and the increment value (5 lbs). However, the last bucket is a catch all for 80lbs - (infinisty).... which we will get by using 999999 (good enough). ---> ( tsum.net_sum >= tpiv.id AND tsum.net_sum < ( CASE WHEN tpiv.id < 80 THEN (tpiv.id + 5) ELSE 999999 END ) ) GROUP BY tpiv.id );
Now, this might not be the easiest query to follow, but let me tell you that it works, and it is quite a bit faster than the original query. Notice that the BIG chunk of a query (the INNER JOIN temp table) is only being run once (and then joined to) as opposed to the 16 or so times it was being run in the original query. But, we are not quite done yet. Now that I have this temp table populated, I still have to set the variables that I set in the original query:
<!--- Now, we have to loop over the increments and find the number of users that recyecled a net weight in the given bucket. ---> <cfloop index="i" from="0" to="80" step="5"> SET @number_#i#_#(i + 5)# = ISNULL( ( SELECT user_count FROM @number_step_5 WHERE step = #i# ), 0 ); </cfloop>
This queries the temp table to find the value for that weight group. Since there might not be any pickups in that weight group, I must use an ISNULL() method call. The genius here though, is that the temp table [@number_step_5] should never have more than 16 or 17 rows in it and most likely less than that. That means that each of these subsequent queries that sets a variable will execute almost immediately with such a minimal amount of data to search.
Pivot tables are totally awesome! Thanks Rick, you helped me rock this SQL query hard core!
Want to use code from this post? Check out the license.