One person recently asked for my help in optimizing a page that outputted 12,000 records. To complicate things, the records were a combination result from two queries that came from two different data sources. One of the huge selling features of ColdFusion query of queries is that you can combine data from different data sources. This page, however, was running very slow. I was asked to speed it up. Here is original code (modified for kinky solutions):
<!--- Get all open orders. ---> <cfquery name="qOrder" datasource="datasource1"> SELECT o.id, o.date_ordered, o.supplier_id FROM orders o WHERE o.is_open = 1 </cfquery> <!--- Get suppliers for given orders. ---> <cfquery name="qSupplier" datasource="datasource2"> SELECT s.id, s.name, s.phone, s.location FROM suppliers s WHERE id IN ( #ValueList( qOrder.suplier_id )# ) </cfquery> <!--- Set default array. ---> <cfset ColumnValues = ArrayNew( 1 ) /> <!--- Add new columns to the open orders query. ---> <cfset temp = QueryAddColumn( qOrder, "supplier_name", ColumnValues ) /> <cfset temp = QueryAddColumn( qOrder, "supplier_phone", ColumnValues ) /> <cfset temp = QueryAddColumn( qOrder, "supplier_location", ColumnValues ) /> <cfset theCurrentRow = 1 /> <!--- Loop through the orders and add supplier. ---> <Cfloop query="qOrder"> <cfquery name="qCheckSupplier" dbtype="Query"> SELECT name, phone, location FROM qSupplier WHERE id = '#trim( qOrder.supplier_id )#' </cfquery> <cfif ( trim(qOrder.supplier_id) IS NOT "" AND ISNumeric(trim(qOrder.supplier_id)) )> <cfset qOrder[ "supplier_name" ][ theCurrentRow ] = "#qCheckSupplier.name#" /> <cfset qOrder[ "supplier_phone" ][ theCurrentRow ] = "#qCheckSupplier.phone#" /> <cfset qOrder[ "supplier_location" ][ theCurrentRow ]= "#qCheckSupplier.location#" /> <cfelse> <cfset qOrder[ "supplier_name" ][ theCurrentRow ] = "" /> <cfset qOrder[ "supplier_phone" ][ theCurrentRow ] = "" /> <cfset qOrder[ "supplier_location" ][ theCurrentRow ] = "" /> </cfif> <cfset theCurrentRow = incrementvalue( theCurrentRow ) /> </cfloop>
Ok, so there's the original code, modified to be more readable (and to hide original name fields and what not). Some things to note. There is not a 1:1 ratio for orders to suppliers. NOT ever order has a supplier. Therefore, we couldn't just do a join on the two data sources as ColdFusion query of queries work with INNER JOINS best. Also, I don't think that ever supplier ID is valid. Meaning, the oders query might have supplier ID's that are NOT in the returned suppliers query. These, however were assumptions I had to make (and they may be wrong).
When I read this code, several things popped out at me immediately:
No one should ever use this method. I honestly cannot thing of any time it would be appropriate, except for maybe needing to pass it around as a method pointer. All this does is add overhead to simple math.
He is using a counter to keep track of the current row. This is not needed. When looping over a query, the query itself keeps track of the current row using the QUERY.CurrentRow variable.
He is changing the query structure of one of the original queries. Remember, this is 12,000 rows. I am not sure how the size of the query relates the QueryAddColumn() method, but I can't image it is easy to just add a column.
Query of Queries
As he loops over the suppliers, he is querying to see if he can find a matching supplier. Query of queries is good, but it has some serious overhead. This can be done without query of queries.
After a quick run through, here is the optimized code:
<!--- Get all open orders. ---> <cfquery name="qOrder" datasource="datasource1"> SELECT o.id, o.date_ordered, o.supplier_id, <!--- Set default supplier values. ---> ( '' ) AS supplier_name, ( '' ) AS supplier_phone, ( '' ) AS supplier_location FROM orders o WHERE o.is_open = 1 </cfquery> <!--- Get suppliers for given orders. ---> <cfquery name="qSupplier" datasource="datasource2"> SELECT s.id, s.name, s.phone, s.location FROM suppliers s WHERE id IN ( #ValueList( qOrder.suplier_id )# ) </cfquery> <!--- Loop through the orders and add supplier. ---> <cfloop query="qOrder"> <!--- Only check for supplier if we have a supplier ID to act on. Otherwise, leave the default values (''). ---> <cfif qOrder.supplier_id> <!--- We have a supplier. Try to find it in the suppliers query: get the row index of the first matching id. ---> <cfset intIndex = qSupplier[ "id" ].IndexOf( JavaCast( "int", qOrder.supplier_id ) ) /> <!--- Add one to the index to make it CF-Friendly. ---> <cfset intIndex = (intIndex + 1) /> <!--- Check to see if we found a match. ---> <cfif intIndex> <!--- Update cell values. ---> <cfset qOrder[ "supplier_name" ][ qOrder.CurrentRow ] = qSupplier[ "name" ][ intIndex ] /> <cfset qOrder[ "supplier_phone" ][ qOrder.CurrentRow ] = qSupplier[ "phone" ][ intIndex ] /> <cfset qOrder[ "supplier_location" ][ qOrder.CurrentRow ] = qSupplier[ "location" ][ intIndex ] /> </cfif> </cfif> </cfloop>
NOTE: I cannot compile this code, so I am sure there ARE syntax errors. Sorry about that. Try and get the general idea.
The first thing I did here is modify the initial query to return ALL columns that were going to be needed. This way, there is no modifying the query afterwards. SQL does this stuff for a living, so I would rather let it handle the blank columns. This also sets default values. In the original code, default values were being set during the query loop. This cuts out that overhead.
Then, I got rid of the row counter. Again, no need for this. It only has extra over head.
Then, I check for a valid supplier_id BEFORE I do any logic on the query. If we don't have a supplier_id, then we are keeping default values - no need to update.
Once I have the supplier_id, I use the ColdFusion query-column's Java method, IndexOf(), to get the first index of the matching supplier_id. This is MUCH faster than the ColdFusion query of queries. Remember, query of queries can return multiple rows and therefore, it must ALWAYS process every single row. It's basically a worst-case scenario each time. We, only want one match, so we just grab the first index and break out of the search algorithm when we find it.
Then, if I did find a matching row for supplier I update the orders query.
So, how did this all work out? Once the programmer implemented my suggestions, the page was MUCH faster. Here are some stats on tests he ran:
- 10,641 ms
- 9,500 ms
- 9,500 ms
- 9,438 ms
- 3,156 ms
- 2,688 ms
- 2,688 ms
- 3,203 ms
As you can see, with my optimizations, the code runs in about 1/3 to close to a 1/4 of the time. In fact, by using the Java IndexOf() method on the ColdFusion query column object, I am told that the page runs as fast as the first qOrder query and the code after that is negligible.
So, there you have it. A case study on page optimization. Keep in mind that there are probably other ways of doing this. And also, I love query of queries... they just weren't right for this situation.
1. I occasionally use IncrementValue() because things like #A+1# didn't always work. (Do they now?) However, #IncrementValue(A)# does. More specifically, I use it in query loops like so:
<tr class="#EvenOdd[IncrementValue(CurrentRow MOD 2)]#">
<!-- ... -->
But that's just me.
2. I would suspect that you could get even more performance by building your own index of the second query:
The final loop is then reduced to:
<cfset QuerySetCell(qOrder, "suppliername", qSupplier.Name[RowID], CurrentRow)>
<cfset QuerySetCell(qOrder, "supplierphone", qSupplier.Phone[RowID], CurrentRow)>
<cfset QuerySetCell(qOrder, "supplier_location", qSupplier.Location[RowID], CurrentRow)>
For large queries, a Struct lookup is going to be heinously faster than an indexOf().
I think #A + 2# always works these days.
I like the idea of the structure. I guess since the example loops over so many rows, the cost of creating the Struct would be a drop in the pond, so to speak, for the speed of 12,000 lookups.
As always, excellent suggestions.
Thanks a lot Ben for this post. This approach to avoid QoQ inside a large loop really helped a lot.