When it comes to database queries, the only kind of JOIN that I like to perform with SQL is an INNER JOIN. They are wicked fast and always satisfying. Any time that I have to figure out which records don't have relationships, it means that I have to do a LEFT OUTER JOIN; this is always depressing and often slow and I try to avoid it as much as possible. That's why I ended up trying something a bit counterintuitive the other day. I was in a situation where I needed to populate one table (T) with the records from table (A) that did not have a related record in table (B). My first instinct was an insert with a LEFT OUTER JOIN between A and B. This was wicked slow (dealing with hundreds of thousands of records). Then, I decided to try a full insert followed by a subsequent INNER JOIN Delete. This, as it turns out, was wicked fast!
To see what I'm talking about, imagine that I have three tables:
The Profile table has a foreign key for the User table - userID - such that (User.id == Profile.userID). The Guest table is empty. What I need to do is populate the Guest table with all User records that do not have a related Profile record.
My first instinct was to do something like this:
<!--- We want to move some of the USER records into the GUEST table. We want to do this when the USER record does not have a matching PROFILE record. ---> <cfquery name="populateGuest" datasource="testing"> <!--- Clear out guest table. ---> TRUNCATE TABLE guest; <!--- Populate the guest table with users that do NOT have profiles. That is, where there is no matching profile record. ---> INSERT INTO guest ( id, name )( SELECT u.id, u.name FROM user u LEFT OUTER JOIN profile p ON u.id = p.userID <!--- Make sure this user doesn't have a matching profile record joined. ---> WHERE p.id IS NULL ); <!--- Return SOME value so we can CFDump the query. ---> SELECT ( 'Done' ) AS message ; </cfquery>
As you can see, I am using a LEFT OUTER JOIN to figure out which User records don't have a Profile. This derived recordset is then inserted directly into the Guest table. This approach works, but is kind of slow; even with my relatively small demo tables (10,000 records), the query runs in about 28ms.
Once I saw this poor performance, I started to think if there was any way that I could accomplish the same thing using an INNER JOIN rather than a LEFT OUTER JOIN. Then it occurred to me! What if I simply INSERT all the User records; then, go back and DELETE any of the Guest records that do have a profile. In this way, I could skip the LEFT OUTER JOIN between User and Profile and replace it with an INNER JOIN between Profile and Guest.
<!--- We want to move some of the USER records into the GUEST table. We want to do this when the USER record does not have a matching PROFILE record. ---> <cfquery name="populateGuest" datasource="testing"> <!--- Clear out guest table. ---> TRUNCATE TABLE guest; <!--- First, we're gonna insert ALL the users into the guest table. Then, we'll worry about pruning the guest table. ---> INSERT INTO guest ( id, name )( SELECT u.id, u.name FROM user u ); <!--- Now that we have completed populated the guest table, we want to delete any record that has a matching profile. ---> DELETE g FROM guest g INNER JOIN profile p ON g.id = p.userID ; <!--- Return SOME value so we can CFDump the query. ---> SELECT ( 'Done' ) AS message ; </cfquery>
As you can see here, I am moving all the User records into Guest. Then, I am pruning the Guest table to remove any rows that were moved over unnecessarily. This might sound like a lot of extra work and overhead; but, with the INNER JOIN instead of the LEFT OUTER JOIN, this query executes much faster! Even with my relatively small demo tables, this query runs in about 14ms - twice as fast as the first approach!
As the record sets get larger, I found the difference in performance to be even more pronounced. It might seem totally counterintuitive to move records unnecessarily; but, when such a move allows you to use an INNER JOIN as opposed to a LEFT OUTER JOIN, the speed of the subsequent DELETE is far better than the slothfulness of the original LEFT OUTER JOIN.
Want to use code from this post? Check out the license.