For years, I've been using the SQL EXISTS() operator to help limit the number of results in a data set. But, it wasn't until just yesterday that I actually started to think about what EXISTS() returned - I sort of just took it for granted that it worked in a WHERE clause. As it turns out, EXISTS() returns True or False, which are MySQL aliases for 1 and 0, respectively. This makes the EXISTS() operator kind of awesome for calculated columns.
In many of my report-oriented or user-interface (UI) "partial" queries, I'll generate calculated columns in my results. Typically, these calculated columns will involve COUNT(); but, a good portion of the time, that COUNT() value is really meant to indicate 0 or not-zero. Or, more specifically True or False.
Well, if the EXISTS() operator returns True or False, I can start to replace some of my calculated COUNT() columns with EXISTS(). To see what I mean, take a look at this code in which we find friends with various love conditions:
<cfquery name="friends" datasource="testing"> SELECT f.id, f.name, /* Check to see if this person is in love. EXISTS() will return True / False (MySQL aliases this to 1 / 0 respectively). */ EXISTS( SELECT -1 /* This value is irrelevant. */ FROM attraction a WHERE a.friendID = f.id AND a.endedAt IS NULL ) AS isInLove, /* Check to see if this person's love is unrequited (Same notes as above). */ EXISTS( SELECT -1 /* This value is irrelevant. */ FROM attraction a LEFT OUTER JOIN attraction back ON ( back.friendID = a.attractedToFriendID AND back.attractedToFriendID = a.friendID AND back.endedAt IS NULL ) WHERE a.friendID = f.id AND a.endedAt IS NULL AND back.id IS NULL ) AS isLoveUnrequited FROM friend f ORDER BY f.name ASC </cfquery> <!--- Output the results. ---> <cfoutput> <cfloop query="friends"> <cfif friends.isInLove> #friends.name# is in <cfif friends.isLoveUnrequited> unrequited </cfif> love.<br /> <cfelse> #friends.name# is not in love.<br /> </cfif> </cfloop> </cfoutput>
As you can see, I'm using the EXISTS() operator to determine if the given friend is in love; and, if that love is unrequited. And, when we run the above code, we get the following output:
Amanda is not in love.
Joanna is not in love.
Julie is in love.
Kim is not in love.
Sarah is in love.
Tricia is in unrequited love.
Obviously, this won't work for every kind of calculated column. And, some calculated column will require JOINs instead of sub-queries. But, for a decent number of queries, the EXISTS() operator is going to be great for derived columns. Not only does it express intent more clearly (existence vs. count), I believe (or hope) that the SQL execution plan will actually be more efficient since EXISTS() should be able to short-circuit the derived result set.
Over on twitter, Ryan Guill pointed out that I should make sure to use EXPLAIN to make sure that the sub-query is as performant:
I did this for a few small test-cases and the EXPLAIN results were both the same, usually looking only at rows indicated by the index. But, he does bring up a good point in that each query is a unique context constrained by different requirements.
In this post, I am not saying that JOINs should be replaced with EXISTS(). I rock out JOINs like boss, on the daily. So much so that I even have a very strong philosophy on how JOINs should be *structured*:
But, there are times, especially when the JOIN precipitates the use of a GROUP BY, when a JOIN does get crazy to reason about and / or hurts performance - try putting a TEXT field in a GROUP BY record-set and having it perform well :P ... welcome to the world of disk IO.
With each query, I think you have to balance performance, readability, and cadence. Each is a lever that you can pull.