Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the jQuery Conference 2011 (Cambridge, MA) with:

SQL COUNT( NULLIF( .. ) ) Is Totally Awesome

By Ben Nadel on
Tags: SQL

Using the SQL GROUP BY clause is really awesome. It makes gathering data "about" the data very easy. Sometimes, though, when you group data, not only do you want to count the number of records in a given group, you want to count the number of records in a group with a given property. This is something that is not as easy. One of the first things you might learn with a SQL COUNT() aggregate is that it can use the DISTINCT directive:

  • SELECT
  • (
  • COUNT( DISTINCT o.user_id )
  • ) AS unique_user_count
  • FROM
  • [order] o
  • GROUP BY
  • o.date_ordered

The above query would give you the number of distinct users that placed orders on any given day. So, if one user places two orders in a single day, that user would still only count as one item in the above query. This is very cool, but what happens when you need more information.

Take for example, this in-memory SQL data table that has a list of girls, some of which I have dated and some of which are just friends:

  • <!--- Create in-memory SQL table. --->
  • DECLARE @girl TABLE (
  • name VARCHAR( 30 ),
  • hair VARCHAR( 10 ),
  • did_date TINYINT
  • );
  •  
  •  
  • <!--- Populate table. --->
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Azure','Brunette',0);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Sarah','Brunette',1);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Ashley','Brunette',1);
  •  
  • INSERT INTO @girl
  • (name,hair,did_date) VALUES ('Heather','Blonde',1);

Notice that the "did_date" column can have either a one or a zero in it indicating what type of relationship this female is to me. Now, let's say I want to gather information about my relationships and the hair color of the girls involved. I can easily get the count of the girls grouped by hair color:

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

That gives me the following data:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

But what happens if, for each hair color, I want to figure how many I have dated and how many I have not dated. I want to try to do something like this:

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count,
  • (
  • COUNT( did_date = 1 )
  • ) AS did_date_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

The goal here is to use "did_date = 1" to get the count of girls I dated. Unfortunately, this throws the error:

Error Executing Database Query. [Macromedia][SQLServer JDBC Driver][SQLServer]Line 28: Incorrect syntax near '='.

You can't use logic like this within a COUNT aggregate. Now, someone might suggest that I just throw the did_date column into the GROUP BY as well:

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • g.did_date,
  • (
  • COUNT( * )
  • ) AS girl_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair,
  • g.did_date
  • </cfquery>

Technically, this does "work," but the results are less than optimal:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

Now, I am not sure how many different combinations are returned. For instance, in the above query, I get two rows for Brunette (dated and not dated) and only one row for Blonde. This is clumsy and awkward to work with.

The better solution is to harness the power of the SQL function, NULLIF(). NULLIF() is pretty bad ass. If you have not used it, it takes two arguments and works such that if the two arguments are equal in value, it returns NULL, otherwise, it just returns the first argument it was passed. So, for instance, NULLIF( 1, 1 ) would return NULL, but NULLIF( 1, 0 ) would return 1.

Now, how does NULLIF() help us? It helps us because the SQL COUNT() aggregate does NOT count NULL values. This is HUGE. Since we know the small set of values that we might have for the did_date column (1 or 0), we can really use NULLIF() to get exactly the data we want:

  • <cfquery name="qGirl" datasource="...">
  • <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
  •  
  • SELECT
  • g.hair,
  • (
  • COUNT( * )
  • ) AS girl_count,
  • (
  • COUNT(
  • NULLIF( did_date, 0 )
  • )
  • ) AS did_date_count,
  • (
  • COUNT(
  • NULLIF( did_date, 1 )
  • )
  • ) AS did_not_date_count
  • FROM
  • @girl g
  • GROUP BY
  • g.hair
  • </cfquery>

This gives us the output:


 
 
 

 
SQL COUNT( NULLIF() )  
 
 
 

As you can see, for each hair color, we are now getting exactly the data that we wanted; the count, the dated count, and the did not date count. The trick is in the NULLIF(). In order to get the number of girls I dated, I "nullify" all did_date values that are zero. This force the COUNT aggregate to only count the number of records that have a one (which is the only non-null option in our problem domain). Then, to find the did_not_date_count value, I do just the opposite.

How freakin' cool is that?!? When I randomly tried this one day and it worked, it blew my mind. Now, I work in MS SQL Server. I don't know if the NULLIF() function is standard. If is not, then, that's probably why your database was free :) (I'm just messing with you...)




Reader Comments

That NullIf is really helpful. Someone else wrote about this recently about it being a good way to get true/false/bit fields out as always 1 or 0, by using NullIf to bring out nulls as 0.

For your purposes though it would seem to be more efficient to do a Sum(did_date) and then Count(*) - Sum(did_date). If you are doing a NullIf on every single record then it would seem to be a bit more taxing than a couple simple aggregates and math operation. If the t/f field is -1 then you could just slap an ABS around the Sum.

Reply to this Comment

Ooooh, the SUM() idea is brilliant! Well played. Of course that only works for things where we are talking about 1 / 0, but then again, that is probably like 95% of the cases I am going to want to use something like this. Thanks.

Reply to this Comment

I'm not sure if this fits in with this problem but an issue I quickly ran into when using Count(Distinct Column) was needing to count on multiple distinct fields. For anyone that is not aware, this can be done by using a subquery with groupings as a table . So you could do:

Count(*) AS CountAll
FROM (Col1, Col2, etc
FROM Table
Group By Col1, Col2
WHERE etc) AS Temp

On a related note, I've also built up query strings to be used as the FROM to abstract out complexity. For example if I know that an account can only have access specific records for that account/user in a table, Some people might try to use query of query but of course as Ben Forta said so eloquently "ColdFusion is not a DBMS". I build up a query string instead and then on subsequent queries use it as a subquery in the FROM clause. The main query can then be more simplified and more secure since account checks are already being made. I believe there are ways to do this in stored procedures for efficiencies sake but this worked out well for me.

Reply to this Comment

SELECT
g.hair,
COUNT(*) AS girl_count,
sum(case when did_date = 1 then 1 else 0 end) as did_date_count,
sum(case when did_date = 0 then 1 else 0 end) as did_not_date_count
FROM @girl g
GROUP BY g.hair

just showing you how you can use the SUM() if the data wasn't numeric. All you have to do is use CASE and set the values yourself.

Reply to this Comment

@Tony,

You magnificent bastard. I didn't realize you could execute CASE statements inside of aggregates.

You and Daniel have just revolutionized the way I can write SQL :D

Reply to this Comment

Good call, Tony. I was just warming up my index fingers to type out almost that exact same reply.

The nice thing about the CASE approach is that it works on pretty much any database system, while NULLIF doesn't.

-R

Reply to this Comment

I've come a long way from the D+ on my first Databases exam in College.... but I have so much more to go :)

Reply to this Comment

That partition stuff looks pretty cool. I have not seen anything like that in SQL server, although, maybe I remember some similar stuff from a presentation I saw on SQL Server 2005. I don't remember though.

Reply to this Comment

Ok,

I think I follow this. What I need to do is take some info from a single table and count and group the data. When I run this I get the same values in each row except for my month and year values. I am trying to run this through cold fusion mx 7.02. BTW yes I am new to this.

Jim

Reply to this Comment

Here is what I have so far,

Thanks

Jim

<cfquery name ="incident" datasource="FireRMS">
select
year(alarmdate) as YOC
,month(alarmdate) as MOC
,(select count(incidenttype) from incident where left(incidenttype,1)= '1' and incidenttype not like 'u%') as NumFire
,(select count(incidenttype) from incident where left(incidenttype,1)= '2' and incidenttype not like 'u%') as NumExplosion
,(select count(incidenttype) from incident where left(incidenttype,1)= '3' and incidenttype not like 'u%') as NumEMS
,(select count(incidenttype) from incident where left(incidenttype,1)= '4' and incidenttype not like 'u%') as NumHazMat
,(select count(incidenttype) from incident where left(incidenttype,1)= '5' and incidenttype not like 'u%') as NumService
,(select count(incidenttype) from incident where left(incidenttype,1)= '6' and incidenttype not like 'u%') as NumGood
,(select count(incidenttype) from incident where left(incidenttype,1)= '7' and incidenttype not like 'u%') as NumFalse
,(select count(incidenttype) from incident where left(incidenttype,1)= '8' and incidenttype not like 'u%') as NumWeather
,(select count(incidenttype) from incident where left(incidenttype,1)= '9' and incidenttype not like 'u%') as NumOther

from incident

where Year(alarmdate) >2000

group by year(alarmdate), month(alarmdate)

order by year(alarmdate), month(alarmdate)
</cfquery>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>

<body>
<table border="0" cellpadding="15" cellspacing="0" bgcolor="#ffffff">

<tr>
<td><B>Year</B></td>
<td><B>Month</B></td>
<td><B>Fire</B></td>
<td><B>Explosion/Overpressure</B></td>
<td><B>EMS</B></td>
<td><B>Hazmat</B></td>
<td><B>Service</B></td>
<td><B>Good Intent</B></td>
<td><B>False</B></td>
<td><B>Weather</B></td>
<td><B>Other</B></td>
</tr>


<cfoutput query = "incident" >

<tr>
<td><div align="center">#yoc#</div></td>
<td><div align="center">
<cfif #moc# eq 1>
JAN
<cfelseif #moc# eq 2>
FEB
<cfelseif #moc# eq 3>
MAR
<cfelseif #moc# eq 4>
APR
<cfelseif #moc# eq 5>
MAY
<cfelseif #moc# eq 6>
JUN
<cfelseif #moc# eq 7>
JUL
<cfelseif #moc# eq 8>
AUG
<cfelseif #moc# eq 9>
SEP
<cfelseif #moc# eq 10>
OCT
<cfelseif #moc# eq 11>
NOV
<cfelseif #moc# eq 12>
DEC
</cfif>

</div></td>

<td><div align="center">#NumFire#</div></td>
<td><div align="center">#NumExplosion#</div></td>
<td><div align="center">#NumEMS#</div></td>
<td><div align="center">#NumHazMat#</div></td>
<td><div align="center">#NumService#</div></td>
<td><div align="center">#NumGood#</div></td>
<td><div align="center">#NumFalse#</div></td>
<td><div align="center">#NumWeather#</div></td>
<td><div align="center">#NumOther#</div></td>
</tr>

</cfoutput>

</body>

</html>

Reply to this Comment

Jim-

Try this:

SELECT
YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) AS MOC,
SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire,
SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion,
-- etc, etc, etc
FROM Incident
WHERE (AlarmDate >= {d '2001-01-01'})
GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate)
ORDER BY 1, 2

That should be heinously faster without the subqueries. I also eliminated the search for an IncidentType that starts with 'u', as if it starts with a number, it can't also start with a 'u'.

-R

Reply to this Comment

Will give it a try, the search for the u was because of an error the field in the database is a text and not an integer, national standard don't ask me, so without that search for the 'u' it returned an error, I will give your suggestion a try,

Thanks for the help

feedback to follow.

Jim

Reply to this Comment

I guess I missed something here, this is what I got as a return.

Not sure what I missed. Can someone please point me in the right direction.

Thanks

Jim

Error Executing Database Query.
[Macromedia][SQLServer JDBC Driver][SQLServer]Incorrect syntax near the keyword 'AS'.

The error occurred in C:\CFusionMX7\wwwroot\CFIDE\gettingstarted\FireRMS\Index.cfm: line 2

1 :
2 : <cfquery name ="incident" datasource="FireRMS">
3 :
4 :

--------------------------------------------------------------------------------

SQL SELECT YEAR(AlarmDate) AS YOC , MONTH(AlarmDate) AS MOC , SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire , SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion , SUM(CASE WHEN LEFT(IncidentType,1) = '3' THEN 1 ELSE 0 END) AS NumEMS , SUM(CASE WHEN LEFT(IncidentType,1) = '4' THEN 1 ELSE 0 END) AS NumHazMat , SUM(CASE WHEN LEFT(IncidentType,1) = '5' THEN 1 ELSE 0 END) AS NumService , SUM(CASE WHEN LEFT(IncidentType,1) = '6' THEN 1 ELSE 0 END) AS NumGood , SUM(CASE WHEN LEFT(IncidentType,1) = '7' THEN 1 ELSE 0 END) AS NumFalse , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumWeather , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumOther FROM Incident WHERE (AlarmDate >= {d '2001-01-01'}) GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) ORDER BY 1, 2
DATASOURCE FireRMS
VENDORERRORCODE 156
SQLSTATE HY000

Reply to this Comment

Select distinct

can be used instead of count distinct

ie.
SELECT DISTINCT
-- Columns
B.SOCREX, COUNT(b.socrex)
-- Tables
FROM "WMSPBSF00"/"CHCART00" A,
"WMSPBSF00"/"SOSTOR00" B
-- Row Selection
WHERE ((CHDCR > &date)
AND (CHMIS1 = '1')
AND (CHSTAT < '60'))
-- Join Conditions
AND (A.CHSTOR = B.SOSTOR)
-- Summary Columns
GROUP BY B.SOCREX
-- Sort Columns
ORDER BY B.SOCREX

Results:
Custom
Expansion
Field COUNT
------------------------------ --------------
AAFM 6
ABQ 16
ATL 32
BAL 57
BHM 19
BNA 12
BOS 50
BTR 29
CAL 52
CLE 28
CLOSE 57
CLT 36
CMH 7
CNJ 61
CON 14

Reply to this Comment

Jim - the DISTINCT in that statement is redundant. The GROUP BY is already going to eliminate any duplicates. Now, if you had the DISTINCT inside the COUNT or other aggregate function, that's something else entirely ...

Reply to this Comment

sweet. haven't touched sql in many years, -- this is just the syntax i needed.

thanks...

Reply to this Comment

When I started reading this, I thought the 'did_date' was going to be the month and day you 'did' them.

Reply to this Comment

@Tom,

Ha ha ha. It's funny cause when I read your comment and then went back to look at the code to see what you were referring to, I read "did_date", and though, "What the hell did I mean there??" (my variable naming). Definitely could have been a better name :)

Reply to this Comment

This *is* huge; it just saved me massive time in writing a correlated subquery.

Why is it so hard to find out what COUNT() *does* with it's argument? It doesn't seem to be documented really well.

Reply to this Comment

Oh, and the part I meant to include and forgot: "yes, this works in MySQL 5".

:-)

I'm sure it would work in PgSQL as well, which is what I use by preference.

And I'm so sorry you had to *pay* for *your* DBMS. :-)

Reply to this Comment

So, is there anyway to get NULLIF to return the null if they *aren't* equal?

It's awfully difficult to invert a null, and worse inside a select...

Reply to this Comment

@Baylink,

I think you can just use a CASE statement inside of the aggregate to return NULL if things are not the same value.

Reply to this Comment

Most excellent post!

might want to add Tony's edits into the body of the post though, searching the comments isn't the most optimal - but definitely is a better approach for non-Boolean values

Reply to this Comment

Just amazing, I did something like this:

SELECT
sum(case when paytype = 1 then amount else 0 end) as bpay,
sum(case when paytype = 2 then amount else 0 end) as paypal,
sum(case when paytype = 3 then amount else 0 end) as ccard
FROM payments
WHERE fkclientid = 20503

Reply to this Comment

Thanks, was looking for this with aggregates. Another nice use is for the div by zero issue:

/isnull(nullif( denominator ,0),1)

which returns 1 instead of 0 to divide by in case you want to avoid a filter within the WHERE section.

Reply to this Comment

Ben,

Periodically I look for answers to CF and SQL questions and, come across your site. Today, the post about NULLIF rocked! I really needed this; although it's a 3 year old post. I can add this to my arsenal.

Thanks,

djkhalif

Reply to this Comment

This works too. I learned this trick a long time ago and it's really powerful for flags.

SELECT
g.hair
, COUNT(*) AS girl_count
, sum(did_date) AS did_date_count
, sum(abs(did_date - 1)) AS did_not_date_count
FROM
@girl g
GROUP BY
g.hair

Reply to this Comment

Thank you Ben and Tony!

Either of these work for the summary report I am working on and the info is much appreciated!

I think I like Tony's a little better because I won't have to educate everyone in my department about NullIf if I use that one, but it's still good to have in the back pocket.

Reply to this Comment

Wouldn't it work simply to do sum(property = 2), since preopert = 2 should evaluate to 1 if true, zero otherwise?

Reply to this Comment

Hum, getting to late for me to type properly.
Anyway, one should even be able to do things like sum(r.status_ID = 1 or r.status_ID = 4 or r.status_ID = 5).

Reply to this Comment

Hum, works for me, I guess it must be a non standard behaviour of mysql then. Well, that is fine by me, as long as it works. :)

Reply to this Comment

It is I think, I just figured it working on one server might mean it was standard. I guess not.

Reply to this Comment

@Leon, @Robert,

Very interesting - I've never seen truthy logic used like that in a SQL environment. You say this works in MySQL? I'll have to give that a try.

Reply to this Comment

>sum(case when paytype = 1 then amount else 0 end) as bpay,
>sum(case when paytype = 2 then amount else 0 end) as paypal,
>sum(case when paytype = 3 then amount else 0 end) as ccard

For the record, a better way to do conditionals inside sums and counts might be:

sum(if(paytype=1, amount, 0)) as bpay,

It is indeed an excellent trick to use conditionals in queries. I like to think of it as a very convenient way to take rows and turn them into columns.

When you finally outgrow your sum(if()) clauses, and just cant figure out how to summarize your data, I highly recommend the combination of TEMPORARY TABLE with primary key and ON DUPLICATE KEY UPDATE clauses - if, that is, your database supports them. Summarizing data in multiple passes might not be quite as fast or quite as simple, but it's close; and additionally, taking temporary table 'snapshots' off a live database and then summarizing off those is likely to interfere with operations as much as a complex count that is likely to render the table locked.

Reply to this Comment

Thanks for this page. This is awesome, it just made my day! I didn't realize we can do so much in SQL... counting by distinct and summing up two unrelated counts is too good.

Reply to this Comment

You can try too in PHP with:

For example:

---------------
$<NAMEVAR> = query result

if( !isset( $<NAMEVAR> ) )
$<NAMEVAR> = 0;
---------------

Use this with each var.

Just another option...

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.