Structs As Query Indexes, Speed, And Rick Osborne
For those of you who follow my blog, you will know that Rick Osborne is the guy who comes in after I explain things and makes killer suggestions about how they can be done better. I recently gave a case study of how ColdFusion code can be optimized. It involved using the IndexOf() method of the ColdFusion query column object. Rick came in and suggested that using a Struct to create your own query-index would perform faster. As I am a man who likes to learn by doing, I thought I would put this to the test.
To test this, I query from a web statistics program. I am getting information from two tables: web_stats_hit and web_stats_session. Each "hit" in the hits table will have a session id in the session table. The session table also have a UUID column "session_id".
As I am trying to test the merging of two data sources, I am going to hit each table individually and then try to update one with matching values from the other. First I am grabbing the two different data sets:
<!--- Query for web hits. ---> | |
<cfquery name="qHit" datasource="..."> | |
SELECT | |
h.id, | |
h.date_created, | |
h.web_stats_session_id, | |
( | |
'' | |
) AS session_id | |
FROM | |
web_stats_hit h | |
</cfquery> | |
<!--- Query for web sessions. ---> | |
<cfquery name="qSession" datasource="..."> | |
SELECT | |
s.id, | |
s.session_id | |
FROM | |
web_stats_session s | |
</cfquery> |
As I tried to explain earlier, for every qHit.web_stats_session_id, there is a matching session, such that for some combo, qHit.web_stats_session_id == qSession.id. And just to get an idea of the amount of data we are talking about:
qHit: 52,290 records
qSession: 34,753 records
That's a LOT of data to go through. Let's get our "test" on:
<cftimer label="IndexOf() Methodology" type="outline"> | |
<!--- Loop over the hit query. ---> | |
<cfloop query="qHit"> | |
<!--- | |
We want to find a matching session_id based | |
on the session. Get index of matching row. | |
---> | |
<cfset intIndex = qSession[ "id" ].IndexOf( | |
JavaCast( "int", qHit.web_stats_session_id ) | |
) /> | |
<!--- Add one to index (to be ColdFusion friendly). ---> | |
<cfset intIndex = (intIndex + 1) /> | |
<!--- Check to see if we have an index. ---> | |
<cfif intIndex> | |
<!--- We found the match, update the row. ---> | |
<cfset qHit[ "session_id" ][ qHit.CurrentRow ] = | |
qSession[ "session_id" ][ intIndex ] | |
/> | |
</cfif> | |
</cfloop> | |
</cftimer> | |
<cftimer label="Struct Index Methodology" type="outline"> | |
<!--- Create a session look up table. ---> | |
<cfset objSessionLookUp = StructNew() /> | |
<!--- | |
Loop over session and set index rows. We will be using | |
the id column of the session as the key and the | |
session_id as the value. This creates our very own, | |
in-memory index of the qSession query based on ID. | |
---> | |
<cfloop query="qSession"> | |
<!--- Index this value. ---> | |
<cfset objSessionLookUp[ qSession.id ] = qSession.CurrentRow /> | |
</cfloop> | |
<!--- Loop over the hit query. ---> | |
<cfloop query="qHit"> | |
<!--- | |
Check to see if the session key exists. If it | |
does, then we found a match. | |
---> | |
<cfif StructKeyExists( | |
objSessionLookUp, | |
qHit.web_stats_session_id | |
)> | |
<!--- Update the session based on the struct-index. ---> | |
<cfset qHit[ "session_id" ][ qHit.CurrentRow ] = | |
objSessionLookUp[ qHit.web_stats_session_id ] | |
/> | |
</cfif> | |
</cfloop> | |
</cftimer> |
It turns out Rick was absolute correct. The struct index performs MUCH faster. Here are the stats:
IndexOf() Methodology : 402,037 ms
Struct Index Methodology : 33,403 ms
Some quick math will show you that the struct index method performs in 8% of the time that the IndexOf() methodology does. EIGHT PERCENT! Nuts. I guess the only downside is that you can potentially create a HUGE in-memory structure; but it's only temporary.
Nicely done Rick!
Want to use code from this post? Check out the license.
Reader Comments
I figured you'd appreciate this:
http://rickosborne.org/images/screenshots/wtg-ben.png
That graph represents the last week's traffic to my site. This site has accounted for 3% of the traffic to my site, behind only Google and MXNA.
Rick,
That's awesome :) Glad to be sending people to you. You have good information to share.