SQL COUNT( NULLIF( .. ) ) Is Totally Awesome

Posted March 15, 2007 at 9:23 AM

Tags: SQL

Using the SQL GROUP BY clause is really awesome. It makes gathering data "about" the data very easy. Sometimes, though, when you group data, not only do you want to count the number of records in a given group, you want to count the number of records in a group with a given property. This is something that is not as easy. One of the first things you might learn with a SQL COUNT() aggregate is that it can use the DISTINCT directive:

 Launch code in new window » Download code as text file »

  • SELECT
  • (
  • COUNT( DISTINCT o.user_id )
  • ) AS unique_user_count
  • FROM
  • [order] o
  • GROUP BY
  • o.date_ordered

The above query would give you the number of distinct users that placed orders on any given day. So, if one user places two orders in a single day, that user would still only count as one item in the above query. This is very cool, but what happens when you need more information.

Take for example, this in-memory SQL data table that has a list of girls, some of which I have dated and some of which are just friends:

 Launch code in new window » Download code as text file »

  • <!--- Create in-memory SQL table. --->
  • DECLARE @girl TABLE (
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 ),
  • did_date TINYINT
  • );
  •  
  •  
  • <!--- Populate table. --->
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Azure','Brunette',0);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Sarah','Brunette',1);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Ashley','Brunette',1);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Heather','Blonde',1);

Notice that the "did_date" column can have either a one or a zero in it indicating what type of relationship this female is to me. Now, let's say I want to gather information about my relationships and the hair color of the girls involved. I can easily get the count of the girls grouped by hair color:

 Launch code in new window » Download code as text file »

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

That gives me the following data:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

But what happens if, for each hair color, I want to figure how many I have dated and how many I have not dated. I want to try to do something like this:

 Launch code in new window » Download code as text file »

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count,
  • (
  • COUNT( did_date = 1 )
  • ) AS did_date_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

The goal here is to use "did_date = 1" to get the count of girls I dated. Unfortunately, this throws the error:

Error Executing Database Query. [Macromedia][SQLServer JDBC Driver][SQLServer]Line 28: Incorrect syntax near '='.

You can't use logic like this within a COUNT aggregate. Now, someone might suggest that I just throw the did_date column into the GROUP BY as well:

 Launch code in new window » Download code as text file »

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • g.did_date,
  • (
  • COUNT( * )
  • ) AS girl_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair,
  • g.did_date
  • </cfquery>

Technically, this does "work," but the results are less than optimal:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

Now, I am not sure how many different combinations are returned. For instance, in the above query, I get two rows for Brunette (dated and not dated) and only one row for Blonde. This is clumsy and awkward to work with.

The better solution is to harness the power of the SQL function, NULLIF(). NULLIF() is pretty bad ass. If you have not used it, it takes two arguments and works such that if the two arguments are equal in value, it returns NULL, otherwise, it just returns the first argument it was passed. So, for instance, NULLIF( 1, 1 ) would return NULL, but NULLIF( 1, 0 ) would return 1.

Now, how does NULLIF() help us? It helps us because the SQL COUNT() aggregate does NOT count NULL values. This is HUGE. Since we know the small set of values that we might have for the did_date column (1 or 0), we can really use NULLIF() to get exactly the data we want:

 Launch code in new window » Download code as text file »

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count,
  • (
  • COUNT(
  • NULLIF( did_date, 0 )
  • )
  • ) AS did_date_count,
  • (
  • COUNT(
  • NULLIF( did_date, 1 )
  • )
  • ) AS did_not_date_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

This gives us the output:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

As you can see, for each hair color, we are now getting exactly the data that we wanted; the count, the dated count, and the did not date count. The trick is in the NULLIF(). In order to get the number of girls I dated, I "nullify" all did_date values that are zero. This force the COUNT aggregate to only count the number of records that have a one (which is the only non-null option in our problem domain). Then, to find the did_not_date_count value, I do just the opposite.

How freakin' cool is that?!? When I randomly tried this one day and it worked, it blew my mind. Now, I work in MS SQL Server. I don't know if the NULLIF() function is standard. If is not, then, that's probably why your database was free :) (I'm just messing with you...)

Download Code Snippet ZIP File

Comments (27)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Keep your Web site content fresh and your overhead costs low with Savvy Content Manager

Reader Comments

That NullIf is really helpful. Someone else wrote about this recently about it being a good way to get true/false/bit fields out as always 1 or 0, by using NullIf to bring out nulls as 0.

For your purposes though it would seem to be more efficient to do a Sum(did_date) and then Count(*) - Sum(did_date). If you are doing a NullIf on every single record then it would seem to be a bit more taxing than a couple simple aggregates and math operation. If the t/f field is -1 then you could just slap an ABS around the Sum.

Posted by Daniel Roberts on Mar 15, 2007 at 9:49 AM


Ooooh, the SUM() idea is brilliant! Well played. Of course that only works for things where we are talking about 1 / 0, but then again, that is probably like 95% of the cases I am going to want to use something like this. Thanks.

Posted by Ben Nadel on Mar 15, 2007 at 9:57 AM


I'm not sure if this fits in with this problem but an issue I quickly ran into when using Count(Distinct Column) was needing to count on multiple distinct fields. For anyone that is not aware, this can be done by using a subquery with groupings as a table . So you could do:

Count(*) AS CountAll
FROM (Col1, Col2, etc
FROM Table
Group By Col1, Col2
WHERE etc) AS Temp

On a related note, I've also built up query strings to be used as the FROM to abstract out complexity. For example if I know that an account can only have access specific records for that account/user in a table, Some people might try to use query of query but of course as Ben Forta said so eloquently "ColdFusion is not a DBMS". I build up a query string instead and then on subsequent queries use it as a subquery in the FROM clause. The main query can then be more simplified and more secure since account checks are already being made. I believe there are ways to do this in stored procedures for efficiencies sake but this worked out well for me.

Posted by Daniel Roberts on Mar 15, 2007 at 10:03 AM


SELECT
g.hair,
COUNT(*) AS girl_count,
sum(case when did_date = 1 then 1 else 0 end) as did_date_count,
sum(case when did_date = 0 then 1 else 0 end) as did_not_date_count
FROM @girl g
GROUP BY g.hair

just showing you how you can use the SUM() if the data wasn't numeric. All you have to do is use CASE and set the values yourself.

Posted by Tony Petruzzi on Mar 15, 2007 at 10:26 AM


@Tony,

You magnificent bastard. I didn't realize you could execute CASE statements inside of aggregates.

You and Daniel have just revolutionized the way I can write SQL :D

Posted by Ben Nadel on Mar 15, 2007 at 10:42 AM


ps. I just tried a CASE statement inside of COUNT() and it worked perfectly. Totally bad ass.

Posted by Ben Nadel on Mar 15, 2007 at 10:43 AM


Good call, Tony. I was just warming up my index fingers to type out almost that exact same reply.

The nice thing about the CASE approach is that it works on pretty much any database system, while NULLIF doesn't.

-R

Posted by Rick O on Mar 15, 2007 at 11:40 AM


I've come a long way from the D+ on my first Databases exam in College.... but I have so much more to go :)

Posted by Ben Nadel on Mar 15, 2007 at 11:55 AM


Another solution that I have and currently use is the use of PARTITION for aggregating records in Oracle. I am not sure if SQL Server has something similar. ByuUsing PARTITION, calculating running totals etc becomes a breeze. Here is a link for your reference

http://www.akadia.com/services/ora_analytic_functions.html

Thanks

Posted by Qasim Rasheed on Mar 15, 2007 at 6:01 PM


That partition stuff looks pretty cool. I have not seen anything like that in SQL server, although, maybe I remember some similar stuff from a presentation I saw on SQL Server 2005. I don't remember though.

Posted by Ben Nadel on Mar 16, 2007 at 7:29 AM


Ok,

I think I follow this. What I need to do is take some info from a single table and count and group the data. When I run this I get the same values in each row except for my month and year values. I am trying to run this through cold fusion mx 7.02. BTW yes I am new to this.

Jim

Posted by James Schell on Mar 16, 2007 at 2:35 PM


I think we need to see your SQL to know what you are talking about.

Posted by Ben Nadel on Mar 16, 2007 at 3:04 PM


Here is what I have so far,

Thanks

Jim

<cfquery name ="incident" datasource="FireRMS">
select
year(alarmdate) as YOC
,month(alarmdate) as MOC
,(select count(incidenttype) from incident where left(incidenttype,1)= '1' and incidenttype not like 'u%') as NumFire
,(select count(incidenttype) from incident where left(incidenttype,1)= '2' and incidenttype not like 'u%') as NumExplosion
,(select count(incidenttype) from incident where left(incidenttype,1)= '3' and incidenttype not like 'u%') as NumEMS
,(select count(incidenttype) from incident where left(incidenttype,1)= '4' and incidenttype not like 'u%') as NumHazMat
,(select count(incidenttype) from incident where left(incidenttype,1)= '5' and incidenttype not like 'u%') as NumService
,(select count(incidenttype) from incident where left(incidenttype,1)= '6' and incidenttype not like 'u%') as NumGood
,(select count(incidenttype) from incident where left(incidenttype,1)= '7' and incidenttype not like 'u%') as NumFalse
,(select count(incidenttype) from incident where left(incidenttype,1)= '8' and incidenttype not like 'u%') as NumWeather
,(select count(incidenttype) from incident where left(incidenttype,1)= '9' and incidenttype not like 'u%') as NumOther

from incident

where Year(alarmdate) >2000

group by year(alarmdate), month(alarmdate)

order by year(alarmdate), month(alarmdate)
</cfquery>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>

<body>
<table border="0" cellpadding="15" cellspacing="0" bgcolor="#ffffff">

<tr>
<td><B>Year</B></td>
<td><B>Month</B></td>
<td><B>Fire</B></td>
<td><B>Explosion/Overpressure</B></td>
<td><B>EMS</B></td>
<td><B>Hazmat</B></td>
<td><B>Service</B></td>
<td><B>Good Intent</B></td>
<td><B>False</B></td>
<td><B>Weather</B></td>
<td><B>Other</B></td>
</tr>


<cfoutput query = "incident" >

<tr>
<td><div align="center">#yoc#</div></td>
<td><div align="center">
<cfif #moc# eq 1>
JAN
<cfelseif #moc# eq 2>
FEB
<cfelseif #moc# eq 3>
MAR
<cfelseif #moc# eq 4>
APR
<cfelseif #moc# eq 5>
MAY
<cfelseif #moc# eq 6>
JUN
<cfelseif #moc# eq 7>
JUL
<cfelseif #moc# eq 8>
AUG
<cfelseif #moc# eq 9>
SEP
<cfelseif #moc# eq 10>
OCT
<cfelseif #moc# eq 11>
NOV
<cfelseif #moc# eq 12>
DEC
</cfif>

</div></td>

<td><div align="center">#NumFire#</div></td>
<td><div align="center">#NumExplosion#</div></td>
<td><div align="center">#NumEMS#</div></td>
<td><div align="center">#NumHazMat#</div></td>
<td><div align="center">#NumService#</div></td>
<td><div align="center">#NumGood#</div></td>
<td><div align="center">#NumFalse#</div></td>
<td><div align="center">#NumWeather#</div></td>
<td><div align="center">#NumOther#</div></td>
</tr>

</cfoutput>

</body>

</html>

Posted by James Schell on Mar 16, 2007 at 3:11 PM


Jim-

Try this:

SELECT
YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) AS MOC,
SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire,
SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion,
-- etc, etc, etc
FROM Incident
WHERE (AlarmDate >= {d '2001-01-01'})
GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate)
ORDER BY 1, 2

That should be heinously faster without the subqueries. I also eliminated the search for an IncidentType that starts with 'u', as if it starts with a number, it can't also start with a 'u'.

-R

Posted by Rick O on Mar 16, 2007 at 7:18 PM


Will give it a try, the search for the u was because of an error the field in the database is a text and not an integer, national standard don't ask me, so without that search for the 'u' it returned an error, I will give your suggestion a try,

Thanks for the help

feedback to follow.

Jim

Posted by James Schell on Mar 16, 2007 at 7:39 PM


I guess I missed something here, this is what I got as a return.

Not sure what I missed. Can someone please point me in the right direction.

Thanks

Jim

Error Executing Database Query.
[Macromedia][SQLServer JDBC Driver][SQLServer]Incorrect syntax near the keyword 'AS'.

The error occurred in C:\CFusionMX7\wwwroot\CFIDE\gettingstarted\FireRMS\Index.cfm: line 2

1 :
2 : <cfquery name ="incident" datasource="FireRMS">
3 :
4 :

--------------------------------------------------------------------------------

SQL SELECT YEAR(AlarmDate) AS YOC , MONTH(AlarmDate) AS MOC , SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire , SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion , SUM(CASE WHEN LEFT(IncidentType,1) = '3' THEN 1 ELSE 0 END) AS NumEMS , SUM(CASE WHEN LEFT(IncidentType,1) = '4' THEN 1 ELSE 0 END) AS NumHazMat , SUM(CASE WHEN LEFT(IncidentType,1) = '5' THEN 1 ELSE 0 END) AS NumService , SUM(CASE WHEN LEFT(IncidentType,1) = '6' THEN 1 ELSE 0 END) AS NumGood , SUM(CASE WHEN LEFT(IncidentType,1) = '7' THEN 1 ELSE 0 END) AS NumFalse , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumWeather , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumOther FROM Incident WHERE (AlarmDate >= {d '2001-01-01'}) GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) ORDER BY 1, 2
DATASOURCE FireRMS
VENDORERRORCODE 156
SQLSTATE HY000

Posted by James Schell on Mar 16, 2007 at 7:50 PM


Select distinct

can be used instead of count distinct

ie.
SELECT DISTINCT
-- Columns
B.SOCREX, COUNT(b.socrex)
-- Tables
FROM "WMSPBSF00"/"CHCART00" A,
"WMSPBSF00"/"SOSTOR00" B
-- Row Selection
WHERE ((CHDCR > &date)
AND (CHMIS1 = '1')
AND (CHSTAT < '60'))
-- Join Conditions
AND (A.CHSTOR = B.SOSTOR)
-- Summary Columns
GROUP BY B.SOCREX
-- Sort Columns
ORDER BY B.SOCREX

Results:
Custom
Expansion
Field COUNT
------------------------------ --------------
AAFM 6
ABQ 16
ATL 32
BAL 57
BHM 19
BNA 12
BOS 50
BTR 29
CAL 52
CLE 28
CLOSE 57
CLT 36
CMH 7
CNJ 61
CON 14

Posted by jim stephens on Mar 23, 2007 at 5:42 PM


Jim - the DISTINCT in that statement is redundant. The GROUP BY is already going to eliminate any duplicates. Now, if you had the DISTINCT inside the COUNT or other aggregate function, that's something else entirely ...

Posted by Rick O on Mar 23, 2007 at 9:33 PM


Ben this totally saved me from pulling out all my hair, which is brown BTW.

Posted by Michael De Jonghe on Sep 7, 2007 at 5:27 PM


Sweeet. Yeah, I was pretty excited when I found out about this.

Posted by Ben Nadel on Sep 7, 2007 at 5:35 PM


sweet. haven't touched sql in many years, -- this is just the syntax i needed.

thanks...

Posted by me on Sep 19, 2007 at 2:01 PM


Tony. You just saved my butt as well. Exactly what I needed. Great post.

// Joff

Posted by Joff on Oct 13, 2007 at 1:04 AM


When I started reading this, I thought the 'did_date' was going to be the month and day you 'did' them.

Posted by Tom on Feb 21, 2008 at 8:44 AM


@Tom,

Ha ha ha. It's funny cause when I read your comment and then went back to look at the code to see what you were referring to, I read "did_date", and though, "What the hell did I mean there??" (my variable naming). Definitely could have been a better name :)

Posted by Ben Nadel on Feb 21, 2008 at 10:15 AM


This *is* huge; it just saved me massive time in writing a correlated subquery.

Why is it so hard to find out what COUNT() *does* with it's argument? It doesn't seem to be documented really well.

Posted by Baylink on Nov 18, 2008 at 5:42 PM


@Baylink,

Glad to help :)

Posted by Ben Nadel on Nov 18, 2008 at 5:44 PM


Oh, and the part I meant to include and forgot: "yes, this works in MySQL 5".

:-)

I'm sure it would work in PgSQL as well, which is what I use by preference.

And I'm so sorry you had to *pay* for *your* DBMS. :-)

Posted by Baylink on Nov 19, 2008 at 9:44 AM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting