Ask Ben: Querying Groups With Conditional Elements

Posted January 14, 2008 at 7:00 AM by Ben Nadel

Tags: ColdFusion, SQL, Ask Ben

I hav a situation in my academic project, and was struggling to get a proper approach towards the solution, then i decided to ask 'kind-hearted' ppl for help ... while browsing net for forums wid posts with similar situation, i 'stumbled' upon you site. it's a nice and great job u r doin here ben. thanks a ton... do consider to direct me towards the solution for the following situation.

The following table reflects the One-to-Many relationship between TransID and TagID

--------------------
|TransID | TagID |
--------------------
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 3 |
| 3 | 4 |
--------------------

i would need to know how exactly i should be aproaching the problem to find out solution for following situations:

STRICT MODE:
------------

* TransID with TagID (1 AND 4 AND Not 2)

Result : i should get only TransID:3 as the result. (ALL Conditions shud return true, so in the interface the user would have selected Having 1,4 and not having 2).

Relaxed Mode:
-------------

* TransID with TagID (1 OR 4 OR Not 2)

Result : i should get TransID:1,2,3 as the result, since each of the TransID satisfies one or other condition. (ANY Condition returning true the TransID qualifies to be returned, and in the interface the user would hav selected Having 1,4 and not Having 2)

Before we get into this, let's start out by building a SQL temp table into which I can insert the test data:

  • <!---
  • Create the SQL build scripts so that we can reuse it
  • within the different CFQuery tags.
  • --->
  • <cfsavecontent variable="strBuildSQL">
  • DECLARE @data TABLE (
  • trans_id INT,
  • tag_id INT
  • );
  •  
  •  
  • <!--- Populate data. --->
  • INSERT INTO @data
  • (
  • trans_id,
  • tag_id
  • )(
  • SELECT 1, 1 UNION ALL
  • SELECT 1, 2 UNION ALL
  • SELECT 2, 1 UNION ALL
  • SELECT 2, 2 UNION ALL
  • SELECT 2, 3 UNION ALL
  • SELECT 3, 3 UNION ALL
  • SELECT 3, 4
  • );
  • </cfsavecontent>

Now that we have that in place, our @data table should mimic what the user has described (although I have changed the column names to line up more appropriately with my own database naming conventions).

Ok, so let's take a look at the problem. For the Strict mode part, the first thing that jumps to mind is grouping. We want to check out the group of TransIDs and make sure that each group contains either TagID 1 or 4, but definitely not TagID 2. To examine groups in SQL Server, we can use the GROUP BY clause. Then, to examine the contents of each group, we can leverage the fact that the COUNT() aggregate can handle CASE statements. If you remember my exploration of the COUNT() / CASE scenario, you will remember that the COUNT() aggregate does not include NULL values in its tally. Therefore, if we can conditionally include values, we can use the COUNT() of each group to narrow down the selection.

Taking this understanding and applying it to the problem at hand, we want to make sure that each TransID group has at least one count of the TagID 1 or 4 and a zero count of the TagID 2:

  • <!--- Query for TransID values. --->
  • <cfquery name="qStrict" datasource="#REQUEST.DSN.Source#">
  • <!--- Build SQL table and populate. --->
  • #PreserveSingleQuotes( strBuildSQL )#
  •  
  •  
  • <!--- Get the appropriate trans IDs. --->
  • SELECT
  • d.trans_id
  • FROM
  • @data d
  • GROUP BY
  • d.trans_id
  • HAVING
  •  
  • <!---
  • Make sure that the group has either
  • tag ID 1 or 4.
  • --->
  • COUNT(
  • CASE
  • WHEN
  • d.tag_id IN ( 1,4 )
  • THEN
  • d.tag_id
  • ELSE
  • NULL
  • END
  • ) > 0
  • AND
  • <!---
  • Make sure that the group does not also contain
  • the tag id, 2.
  • --->
  • COUNT(
  • CASE
  • WHEN
  • d.tag_id = 2
  • THEN
  • d.tag_id
  • ELSE
  • NULL
  • END
  • ) = 0
  • ;
  • </cfquery>
  •  
  •  
  • <!--- Output out results. --->
  • TransID: #ValueList( qStrict.trans_id, ', ' )#

Running this SQL, we get the following output:

TransID: 3

This works, but it looks really complicated. We can start to simplify the query if we look at the problem in a slightly different way. We can think of the 1 and 4 requirements as filter and then the NOT 2 as a secondary layer of filter. Let's run a query that includes just the first part of that filtering:

  • <!--- Query for TransID values. --->
  • <cfquery name="qStrict" datasource="#REQUEST.DSN.Source#">
  • <!--- Build SQL table and populate. --->
  • #PreserveSingleQuotes( strBuildSQL )#
  •  
  •  
  • <!--- Get the appropriate trans IDs. --->
  • SELECT
  • d.trans_id
  • FROM
  • @data d
  • WHERE
  • d.tag_id IN ( 1,4 )
  • GROUP BY
  • d.trans_id
  • ;
  • </cfquery>
  •  
  •  
  • <!--- Output out results. --->
  • TransID: #ValueList( qStrict.trans_id, ', ' )#

This gives us the following output:

TransID: 1, 2, 3

Good, the first part works. Now, we want to apply the second filter, not having TagID 2. We can't just add another simple WHERE clause for != 2 since any TagID that is 1 or 4 is ALSO not equal to 2. What we need to do is apply the not 2 to each group of TransIDs. To do this, we are going to add a NOT EXISTS clause:

  • <!--- Query for TransID values. --->
  • <cfquery name="qStrict" datasource="#REQUEST.DSN.Source#">
  • <!--- Build SQL table and populate. --->
  • #PreserveSingleQuotes( strBuildSQL )#
  •  
  •  
  • <!--- Get the appropriate trans IDs. --->
  • SELECT
  • d.trans_id
  • FROM
  • @data d
  • WHERE
  • d.tag_id IN ( 1,4 )
  • AND
  • <!---
  • Make sure that this TransID value does not
  • exists in a group that ALSO contains a TagId, 2.
  • --->
  • NOT EXISTS
  • (
  • SELECT
  • 1
  • FROM
  • @data d2
  • WHERE
  • d2.trans_id = d.trans_id
  • AND
  • d2.tag_id = 2
  • )
  • GROUP BY
  • d.trans_id
  • ;
  • </cfquery>
  •  
  •  
  • <!--- Output out results. --->
  • TransID: #ValueList( qStrict.trans_id, ', ' )#

Running the above code, we get the following output:

TransID: 3

This version works as well. To me, this is somewhat easier to read than the leveraged COUNT() aggregate; however, I am not sure exactly which one is going to be faster. I am sure that it will depend somewhat on the size of the table and on the indexes that are in place.

Now that we have the Strict Matching out of the way, let's take a look at the Relaxed Matching mode. Relaxed matching is much easier as it doesn't really apply to the group level, but rather to the individual records being examined. As such, we can simply use a straightforward WHERE clause to limit each returned TransID and then group by the TransID to make sure we don't get duplicate values:

  • <!--- Query for TransID values. --->
  • <cfquery name="qRelaxed" datasource="#REQUEST.DSN.Source#">
  • <!--- Build SQL table and populate. --->
  • #PreserveSingleQuotes( strBuildSQL )#
  •  
  •  
  • <!--- Get the appropriate trans IDs. --->
  • SELECT
  • d.trans_id
  • FROM
  • @data d
  • WHERE
  • d.tag_id = 1
  • OR
  • d.tag_id = 4
  • OR
  • d.tag_id != 2
  • GROUP BY
  • d.trans_id
  • ;
  • </cfquery>
  •  
  •  
  • <!--- Output out results. --->
  • TransID: #ValueList( qRelaxed.trans_id, ', ' )#

Running the above code, we get the following output:

TransID: 1, 2, 3

Works fine and as you can see, relaxed mode matching is much easier and faster to perform.

I hope that helps. Please feel free to follow up with comments below.



Reader Comments

Jan 15, 2008 at 10:11 AM // reply »
1 Comments

Very thanks for the solution. please consider the situation,

if the 'Strict mode' is made more strict,

i.e when TransID with TagID (1 AND 4 AND Not 2) means :
TransIDs are expected to have both TagID 1 AND 4 (in the solution you coded, using TagID (1,4), will result in '1 OR 4')

so only when user is searching for TransID with TagID (1 AND 2 AND NOT 3) only the TransID: 1 satisfies the condition and should be returned ...

i am using a query like

SELECT TransID
FROM Trans_Tags
WHERE TransID IN (SELECT TransID in Trans_Tags WHERE TagID = 1) AND
WHERE TransID IN (SELECT TransID in Trans_Tags WHERE TagID = 2)
AND
WHERE TransID NOT IN (SELECT TransID in Trans_Tags WHERE TagID = 3)

will this query run efficiently in a huge tables with say a million records ?! or could you suggest me any better approach ?!

thanks for your efforts.


Jan 25, 2008 at 9:47 AM // reply »
11,243 Comments

@Vasanth,

I have been told that it is more efficient to use the EXISTS construct rather that the IN construct. Also, if the tables are indexed on the proper columns, then the look up should be fairly fast.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 23, 2013 at 11:06 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, Are you talking about As Number: YES As String: YES As Java: YES? If so, that's with 3 different ways of referencing the constant 1, not users.id[1]. Query object references(*) are what seem ... read »
May 23, 2013 at 9:55 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dan, According to the CF Admin, I'm running Java "1.6.0_45". As far as the DB column, in the database it's an INT. I'll see if I can dig into what CF sees it as. @WebManWalking, But h ... read »
May 23, 2013 at 9:49 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben, I think the problem is that we're used to loose typing in ColdFusion, like JavaScript. If a value is a number but it's needed in an expression to be a string, noooo problem. I've encountered ... read »
May 23, 2013 at 9:47 AM
ColdFusion QueryAppend( qOne, qTwo )
You rock! Thank you, thank you, thank you!!! ... read »
May 23, 2013 at 5:19 AM
Ask Ben: Print Part Of A Web Page With jQuery
How to print also the background color of table cells and table lines ... read »
May 23, 2013 at 3:55 AM
Javascript Array Methods: Unshift(), Shift(), Push(), And Pop()
very interesting and helpful too. ... read »
May 22, 2013 at 5:35 PM
Script Tags, jQuery, And Html(), Text() And Contents()
This is still an issue 2 years later. jQuery is supposed to remediate these cross browser issues, no? I have been unable to find any statement from the jQuery team calling this behavior "by de ... read »
May 22, 2013 at 12:44 PM
Ask Ben: Query Loop Inside CFScript Tags
In cf10, if you call a function that has: local.result = {}; local.result.msg = ""; local.svc = new query(); local.svc.setSQL("SELECT * FROM..."); local.obj = local.svc.exe ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools