SQL COUNT( NULLIF( .. ) ) Is Totally Awesome
Posted March 15, 2007 at 9:23 AM by Ben Nadel
Using the SQL GROUP BY clause is really awesome. It makes gathering data "about" the data very easy. Sometimes, though, when you group data, not only do you want to count the number of records in a given group, you want to count the number of records in a group with a given property. This is something that is not as easy. One of the first things you might learn with a SQL COUNT() aggregate is that it can use the DISTINCT directive:
- SELECT
- (
- COUNT( DISTINCT o.user_id )
- ) AS unique_user_count
- FROM
- [order] o
- GROUP BY
- o.date_ordered
The above query would give you the number of distinct users that placed orders on any given day. So, if one user places two orders in a single day, that user would still only count as one item in the above query. This is very cool, but what happens when you need more information.
Take for example, this in-memory SQL data table that has a list of girls, some of which I have dated and some of which are just friends:
- <!--- Create in-memory SQL table. --->
- DECLARE @girl TABLE (
- name VARCHAR( 30 ),
- hair VARCHAR( 10 ),
- did_date TINYINT
- );
-
-
- <!--- Populate table. --->
- INSERT INTO @girl
- (name,hair,did_date) VALUES ('Azure','Brunette',0);
-
- INSERT INTO @girl
- (name,hair,did_date) VALUES ('Sarah','Brunette',1);
-
- INSERT INTO @girl
- (name,hair,did_date) VALUES ('Ashley','Brunette',1);
-
- INSERT INTO @girl
- (name,hair,did_date) VALUES ('Heather','Blonde',1);
Notice that the "did_date" column can have either a one or a zero in it indicating what type of relationship this female is to me. Now, let's say I want to gather information about my relationships and the hair color of the girls involved. I can easily get the count of the girls grouped by hair color:
- <cfquery name="qGirl" datasource="...">
- <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
-
- SELECT
- g.hair,
- (
- COUNT( * )
- ) AS girl_count
- FROM
- @girl g
- GROUP BY
- g.hair
- </cfquery>
That gives me the following data:
| | | | ||
| | ![]() | | ||
| | | |
But what happens if, for each hair color, I want to figure how many I have dated and how many I have not dated. I want to try to do something like this:
- <cfquery name="qGirl" datasource="...">
- <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
-
- SELECT
- g.hair,
- (
- COUNT( * )
- ) AS girl_count,
- (
- COUNT( did_date = 1 )
- ) AS did_date_count
- FROM
- @girl g
- GROUP BY
- g.hair
- </cfquery>
The goal here is to use "did_date = 1" to get the count of girls I dated. Unfortunately, this throws the error:
Error Executing Database Query. [Macromedia][SQLServer JDBC Driver][SQLServer]Line 28: Incorrect syntax near '='.
You can't use logic like this within a COUNT aggregate. Now, someone might suggest that I just throw the did_date column into the GROUP BY as well:
- <cfquery name="qGirl" datasource="...">
- <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
-
- SELECT
- g.hair,
- g.did_date,
- (
- COUNT( * )
- ) AS girl_count
- FROM
- @girl g
- GROUP BY
- g.hair,
- g.did_date
- </cfquery>
Technically, this does "work," but the results are less than optimal:
| | | | ||
| | ![]() | | ||
| | | |
Now, I am not sure how many different combinations are returned. For instance, in the above query, I get two rows for Brunette (dated and not dated) and only one row for Blonde. This is clumsy and awkward to work with.
The better solution is to harness the power of the SQL function, NULLIF(). NULLIF() is pretty bad ass. If you have not used it, it takes two arguments and works such that if the two arguments are equal in value, it returns NULL, otherwise, it just returns the first argument it was passed. So, for instance, NULLIF( 1, 1 ) would return NULL, but NULLIF( 1, 0 ) would return 1.
Now, how does NULLIF() help us? It helps us because the SQL COUNT() aggregate does NOT count NULL values. This is HUGE. Since we know the small set of values that we might have for the did_date column (1 or 0), we can really use NULLIF() to get exactly the data we want:
- <cfquery name="qGirl" datasource="...">
- <!--- ... CODE TO BUILD TABLE (above) HERE.... --->
-
- SELECT
- g.hair,
- (
- COUNT( * )
- ) AS girl_count,
- (
- COUNT(
- NULLIF( did_date, 0 )
- )
- ) AS did_date_count,
- (
- COUNT(
- NULLIF( did_date, 1 )
- )
- ) AS did_not_date_count
- FROM
- @girl g
- GROUP BY
- g.hair
- </cfquery>
This gives us the output:
| | | | ||
| | ![]() | | ||
| | | |
As you can see, for each hair color, we are now getting exactly the data that we wanted; the count, the dated count, and the did not date count. The trick is in the NULLIF(). In order to get the number of girls I dated, I "nullify" all did_date values that are zero. This force the COUNT aggregate to only count the number of records that have a one (which is the only non-null option in our problem domain). Then, to find the did_not_date_count value, I do just the opposite.
How freakin' cool is that?!? When I randomly tried this one day and it worked, it blew my mind. Now, I work in MS SQL Server. I don't know if the NULLIF() function is standard. If is not, then, that's probably why your database was free :) (I'm just messing with you...)
Reader Comments
That NullIf is really helpful. Someone else wrote about this recently about it being a good way to get true/false/bit fields out as always 1 or 0, by using NullIf to bring out nulls as 0.
For your purposes though it would seem to be more efficient to do a Sum(did_date) and then Count(*) - Sum(did_date). If you are doing a NullIf on every single record then it would seem to be a bit more taxing than a couple simple aggregates and math operation. If the t/f field is -1 then you could just slap an ABS around the Sum.
Ooooh, the SUM() idea is brilliant! Well played. Of course that only works for things where we are talking about 1 / 0, but then again, that is probably like 95% of the cases I am going to want to use something like this. Thanks.
I'm not sure if this fits in with this problem but an issue I quickly ran into when using Count(Distinct Column) was needing to count on multiple distinct fields. For anyone that is not aware, this can be done by using a subquery with groupings as a table . So you could do:
Count(*) AS CountAll
FROM (Col1, Col2, etc
FROM Table
Group By Col1, Col2
WHERE etc) AS Temp
On a related note, I've also built up query strings to be used as the FROM to abstract out complexity. For example if I know that an account can only have access specific records for that account/user in a table, Some people might try to use query of query but of course as Ben Forta said so eloquently "ColdFusion is not a DBMS". I build up a query string instead and then on subsequent queries use it as a subquery in the FROM clause. The main query can then be more simplified and more secure since account checks are already being made. I believe there are ways to do this in stored procedures for efficiencies sake but this worked out well for me.
SELECT
g.hair,
COUNT(*) AS girl_count,
sum(case when did_date = 1 then 1 else 0 end) as did_date_count,
sum(case when did_date = 0 then 1 else 0 end) as did_not_date_count
FROM @girl g
GROUP BY g.hair
just showing you how you can use the SUM() if the data wasn't numeric. All you have to do is use CASE and set the values yourself.
@Tony,
You magnificent bastard. I didn't realize you could execute CASE statements inside of aggregates.
You and Daniel have just revolutionized the way I can write SQL :D
ps. I just tried a CASE statement inside of COUNT() and it worked perfectly. Totally bad ass.
Good call, Tony. I was just warming up my index fingers to type out almost that exact same reply.
The nice thing about the CASE approach is that it works on pretty much any database system, while NULLIF doesn't.
-R
I've come a long way from the D+ on my first Databases exam in College.... but I have so much more to go :)
Another solution that I have and currently use is the use of PARTITION for aggregating records in Oracle. I am not sure if SQL Server has something similar. ByuUsing PARTITION, calculating running totals etc becomes a breeze. Here is a link for your reference
http://www.akadia.com/services/ora_analytic_functions.html
Thanks
That partition stuff looks pretty cool. I have not seen anything like that in SQL server, although, maybe I remember some similar stuff from a presentation I saw on SQL Server 2005. I don't remember though.
Ok,
I think I follow this. What I need to do is take some info from a single table and count and group the data. When I run this I get the same values in each row except for my month and year values. I am trying to run this through cold fusion mx 7.02. BTW yes I am new to this.
Jim
I think we need to see your SQL to know what you are talking about.
Here is what I have so far,
Thanks
Jim
<cfquery name ="incident" datasource="FireRMS">
select
year(alarmdate) as YOC
,month(alarmdate) as MOC
,(select count(incidenttype) from incident where left(incidenttype,1)= '1' and incidenttype not like 'u%') as NumFire
,(select count(incidenttype) from incident where left(incidenttype,1)= '2' and incidenttype not like 'u%') as NumExplosion
,(select count(incidenttype) from incident where left(incidenttype,1)= '3' and incidenttype not like 'u%') as NumEMS
,(select count(incidenttype) from incident where left(incidenttype,1)= '4' and incidenttype not like 'u%') as NumHazMat
,(select count(incidenttype) from incident where left(incidenttype,1)= '5' and incidenttype not like 'u%') as NumService
,(select count(incidenttype) from incident where left(incidenttype,1)= '6' and incidenttype not like 'u%') as NumGood
,(select count(incidenttype) from incident where left(incidenttype,1)= '7' and incidenttype not like 'u%') as NumFalse
,(select count(incidenttype) from incident where left(incidenttype,1)= '8' and incidenttype not like 'u%') as NumWeather
,(select count(incidenttype) from incident where left(incidenttype,1)= '9' and incidenttype not like 'u%') as NumOther
from incident
where Year(alarmdate) >2000
group by year(alarmdate), month(alarmdate)
order by year(alarmdate), month(alarmdate)
</cfquery>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>
<body>
<table border="0" cellpadding="15" cellspacing="0" bgcolor="#ffffff">
<tr>
<td><B>Year</B></td>
<td><B>Month</B></td>
<td><B>Fire</B></td>
<td><B>Explosion/Overpressure</B></td>
<td><B>EMS</B></td>
<td><B>Hazmat</B></td>
<td><B>Service</B></td>
<td><B>Good Intent</B></td>
<td><B>False</B></td>
<td><B>Weather</B></td>
<td><B>Other</B></td>
</tr>
<cfoutput query = "incident" >
<tr>
<td><div align="center">#yoc#</div></td>
<td><div align="center">
<cfif #moc# eq 1>
JAN
<cfelseif #moc# eq 2>
FEB
<cfelseif #moc# eq 3>
MAR
<cfelseif #moc# eq 4>
APR
<cfelseif #moc# eq 5>
MAY
<cfelseif #moc# eq 6>
JUN
<cfelseif #moc# eq 7>
JUL
<cfelseif #moc# eq 8>
AUG
<cfelseif #moc# eq 9>
SEP
<cfelseif #moc# eq 10>
OCT
<cfelseif #moc# eq 11>
NOV
<cfelseif #moc# eq 12>
DEC
</cfif>
</div></td>
<td><div align="center">#NumFire#</div></td>
<td><div align="center">#NumExplosion#</div></td>
<td><div align="center">#NumEMS#</div></td>
<td><div align="center">#NumHazMat#</div></td>
<td><div align="center">#NumService#</div></td>
<td><div align="center">#NumGood#</div></td>
<td><div align="center">#NumFalse#</div></td>
<td><div align="center">#NumWeather#</div></td>
<td><div align="center">#NumOther#</div></td>
</tr>
</cfoutput>
</body>
</html>
Jim-
Try this:
SELECT
YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) AS MOC,
SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire,
SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion,
-- etc, etc, etc
FROM Incident
WHERE (AlarmDate >= {d '2001-01-01'})
GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate)
ORDER BY 1, 2
That should be heinously faster without the subqueries. I also eliminated the search for an IncidentType that starts with 'u', as if it starts with a number, it can't also start with a 'u'.
-R
Will give it a try, the search for the u was because of an error the field in the database is a text and not an integer, national standard don't ask me, so without that search for the 'u' it returned an error, I will give your suggestion a try,
Thanks for the help
feedback to follow.
Jim
I guess I missed something here, this is what I got as a return.
Not sure what I missed. Can someone please point me in the right direction.
Thanks
Jim
Error Executing Database Query.
[Macromedia][SQLServer JDBC Driver][SQLServer]Incorrect syntax near the keyword 'AS'.
The error occurred in C:\CFusionMX7\wwwroot\CFIDE\gettingstarted\FireRMS\Index.cfm: line 2
1 :
2 : <cfquery name ="incident" datasource="FireRMS">
3 :
4 :
--------------------------------------------------------------------------------
SQL SELECT YEAR(AlarmDate) AS YOC , MONTH(AlarmDate) AS MOC , SUM(CASE WHEN LEFT(IncidentType,1) = '1' THEN 1 ELSE 0 END) AS NumFire , SUM(CASE WHEN LEFT(IncidentType,1) = '2' THEN 1 ELSE 0 END) AS NumExplosion , SUM(CASE WHEN LEFT(IncidentType,1) = '3' THEN 1 ELSE 0 END) AS NumEMS , SUM(CASE WHEN LEFT(IncidentType,1) = '4' THEN 1 ELSE 0 END) AS NumHazMat , SUM(CASE WHEN LEFT(IncidentType,1) = '5' THEN 1 ELSE 0 END) AS NumService , SUM(CASE WHEN LEFT(IncidentType,1) = '6' THEN 1 ELSE 0 END) AS NumGood , SUM(CASE WHEN LEFT(IncidentType,1) = '7' THEN 1 ELSE 0 END) AS NumFalse , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumWeather , SUM(CASE WHEN LEFT(IncidentType,1) = '8' THEN 1 ELSE 0 END) AS NumOther FROM Incident WHERE (AlarmDate >= {d '2001-01-01'}) GROUP BY YEAR(AlarmDate) AS YOC, MONTH(AlarmDate) ORDER BY 1, 2
DATASOURCE FireRMS
VENDORERRORCODE 156
SQLSTATE HY000
Select distinct
can be used instead of count distinct
ie.
SELECT DISTINCT
-- Columns
B.SOCREX, COUNT(b.socrex)
-- Tables
FROM "WMSPBSF00"/"CHCART00" A,
"WMSPBSF00"/"SOSTOR00" B
-- Row Selection
WHERE ((CHDCR > &date)
AND (CHMIS1 = '1')
AND (CHSTAT < '60'))
-- Join Conditions
AND (A.CHSTOR = B.SOSTOR)
-- Summary Columns
GROUP BY B.SOCREX
-- Sort Columns
ORDER BY B.SOCREX
Results:
Custom
Expansion
Field COUNT
------------------------------ --------------
AAFM 6
ABQ 16
ATL 32
BAL 57
BHM 19
BNA 12
BOS 50
BTR 29
CAL 52
CLE 28
CLOSE 57
CLT 36
CMH 7
CNJ 61
CON 14
Jim - the DISTINCT in that statement is redundant. The GROUP BY is already going to eliminate any duplicates. Now, if you had the DISTINCT inside the COUNT or other aggregate function, that's something else entirely ...
Ben this totally saved me from pulling out all my hair, which is brown BTW.
Sweeet. Yeah, I was pretty excited when I found out about this.
sweet. haven't touched sql in many years, -- this is just the syntax i needed.
thanks...
Tony. You just saved my butt as well. Exactly what I needed. Great post.
// Joff
When I started reading this, I thought the 'did_date' was going to be the month and day you 'did' them.
@Tom,
Ha ha ha. It's funny cause when I read your comment and then went back to look at the code to see what you were referring to, I read "did_date", and though, "What the hell did I mean there??" (my variable naming). Definitely could have been a better name :)
This *is* huge; it just saved me massive time in writing a correlated subquery.
Why is it so hard to find out what COUNT() *does* with it's argument? It doesn't seem to be documented really well.
@Baylink,
Glad to help :)
Oh, and the part I meant to include and forgot: "yes, this works in MySQL 5".
:-)
I'm sure it would work in PgSQL as well, which is what I use by preference.
And I'm so sorry you had to *pay* for *your* DBMS. :-)
So, is there anyway to get NULLIF to return the null if they *aren't* equal?
It's awfully difficult to invert a null, and worse inside a select...
@Baylink,
I think you can just use a CASE statement inside of the aggregate to return NULL if things are not the same value.
Thanks for the tutorial. This was precisely what i needed.
Most excellent post!
might want to add Tony's edits into the body of the post though, searching the comments isn't the most optimal - but definitely is a better approach for non-Boolean values
Just amazing, I did something like this:
SELECT
sum(case when paytype = 1 then amount else 0 end) as bpay,
sum(case when paytype = 2 then amount else 0 end) as paypal,
sum(case when paytype = 3 then amount else 0 end) as ccard
FROM payments
WHERE fkclientid = 20503
Thanks, was looking for this with aggregates. Another nice use is for the div by zero issue:
/isnull(nullif( denominator ,0),1)
which returns 1 instead of 0 to divide by in case you want to avoid a filter within the WHERE section.
@Corey,
Cool tip.
Ben,
Periodically I look for answers to CF and SQL questions and, come across your site. Today, the post about NULLIF rocked! I really needed this; although it's a 3 year old post. I can add this to my arsenal.
Thanks,
djkhalif
@Djkhalif,
Awesome :) Glad that this could be pulled out the past and still be helpful.
This works too. I learned this trick a long time ago and it's really powerful for flags.
SELECT
g.hair
, COUNT(*) AS girl_count
, sum(did_date) AS did_date_count
, sum(abs(did_date - 1)) AS did_not_date_count
FROM
@girl g
GROUP BY
g.hair
@Robert,
That's pretty clever to subtract 1 from the flag value. Snazzy.
Thank you Ben and Tony!
Either of these work for the summary report I am working on and the info is much appreciated!
I think I like Tony's a little better because I won't have to educate everyone in my department about NullIf if I use that one, but it's still good to have in the back pocket.
@AmberCF,
Glad we could help. @Tony - way to come through with the great insight.
Wouldn't it work simply to do sum(property = 2), since preopert = 2 should evaluate to 1 if true, zero otherwise?
Hum, getting to late for me to type properly.
Anyway, one should even be able to do things like sum(r.status_ID = 1 or r.status_ID = 4 or r.status_ID = 5).
No, Leon. That will not work. Expressions like that do not evaluate to boolean or bit values.
Hum, works for me, I guess it must be a non standard behaviour of mysql then. Well, that is fine by me, as long as it works. :)
sorry, I was sure this thread was referring to SQL Server.
It is I think, I just figured it working on one server might mean it was standard. I guess not.
@Leon, @Robert,
Very interesting - I've never seen truthy logic used like that in a SQL environment. You say this works in MySQL? I'll have to give that a try.
>sum(case when paytype = 1 then amount else 0 end) as bpay,
>sum(case when paytype = 2 then amount else 0 end) as paypal,
>sum(case when paytype = 3 then amount else 0 end) as ccard
For the record, a better way to do conditionals inside sums and counts might be:
sum(if(paytype=1, amount, 0)) as bpay,
It is indeed an excellent trick to use conditionals in queries. I like to think of it as a very convenient way to take rows and turn them into columns.
When you finally outgrow your sum(if()) clauses, and just cant figure out how to summarize your data, I highly recommend the combination of TEMPORARY TABLE with primary key and ON DUPLICATE KEY UPDATE clauses - if, that is, your database supports them. Summarizing data in multiple passes might not be quite as fast or quite as simple, but it's close; and additionally, taking temporary table 'snapshots' off a live database and then summarizing off those is likely to interfere with operations as much as a complex count that is likely to render the table locked.
It's so simple that it's brilliant.
Thanks!
Great article - your example was spot on, works like a charm - thanks!
Thanks for this page. This is awesome, it just made my day! I didn't realize we can do so much in SQL... counting by distinct and summing up two unrelated counts is too good.






