Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the New York ColdFusion User Group (May. 2008) with: Michael Smith

Ask Ben: Pulling SQL Records Based On A List Of IDs

By Ben Nadel on

Looping over a list - hey and thank you for all your help your the best!!!! I have a list I created 6,7,4,29,32,12 etc ... #theList# ... Each number is a record ID from the database. Now I want to loop over the list and pull a field from the record based on the ID. I cant get it man.

When I first got this email, I thought I would just whip up a quick example and send it on its way. But, then I got to thinking about it; there are several ways to accomplish this task, each of which has its own pros and cons and consists of a different level of complexity. And so, I thought I would cover a few different methods so people can see different ways to attack the same problem.

I think the solution that comes most naturally to people, esepcially those who are newer to programming with a database, is the one in which we iterate over the ID list and then for each list iteration, we make a call to the database for the corresponding record:

  • <!---
  • Create a list of ID's who's fields we want to
  • pull out of the database.
  • --->
  • <cfset lstID = "6,7,4,29,32,12" />
  •  
  •  
  • <!---
  • Loop over the list so that we can get the record
  • for each ID.
  • --->
  • <cfloop
  • index="intID"
  • list="#lstID#"
  • delimiters=",">
  •  
  • <!--- Query for record. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
  • SELECT
  • b.id,
  • b.name,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id = <cfqueryparam value="#intID#" cfsqltype="cf_sql_integer" />
  • </cfquery>
  •  
  • <!--- Output the row ID and name. --->
  • <p>
  • [ #qBlog.id# ]
  • #qBlog.name#
  • </p>
  •  
  • </cfloop>

Running the above code, we get the following output:

[ 6 ] FireFox and XStandard CSS File Linkage

[ 7 ] Over-Complicating Getting the Rest of List

[ 4 ] Trouble with XStandard and FireFox

[ 29 ] Getting Back Into The Gym

[ 32 ] Getting Group By Count In Left Outer Join

[ 12 ] Thanks to the Texas Web Developers

As you can see, all 6 IDs resulted in a record pulled from the database. Additionally, because we were looping over the list, each record is pulled in the same order in which the IDs appeared in the list.

This solution is good because it is simple and very easy to understand. It's also a tiny amount of code. This is all very appealing at first; but, as you start to evaluate code performance, you will notice that database calls are expensive and here, we are making a separate database calls for every single ID in our list. For a list of 6 IDs, this might not be so bad, but as our ID list grows, this is going to become more noticeable.

To overcome the multiple database calls problem, the next solution combines all the database calls into one query that returns a record for each ID in the list. The easiest way to accomplish this is to use the SQL IN clause. IN takes a comma delimited list and is used for existence checking.

  • <!---
  • Create a list of ID's who's fields we want to
  • pull out of the database.
  • --->
  • <cfset lstID = "6,7,4,29,32,12" />
  •  
  • <!--- Query for record. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
  • SELECT
  • b.id,
  • b.name,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id IN
  • (
  • <!---
  • Pass in the ID list as a listed query param.
  • The LIST attribute will cause this to act as
  • a comma delimited list in which each value
  • gets bound to an item in the list.
  • --->
  • <cfqueryparam value="#lstID#" cfsqltype="cf_sql_integer" list="true" />
  • )
  • ORDER BY
  • (
  • CASE
  • b.id
  •  
  • <!---
  • Loop over the ID list and ensure that the
  • records are ID-ordered in the same order
  • they are in the list.
  • --->
  • <cfloop
  • index="intIndex"
  • from="1"
  • to="#ListLen( lstID )#">
  •  
  • WHEN
  • #ListGetAt( lstID, intIndex )#
  • THEN
  • #intIndex#
  •  
  • </cfloop>
  •  
  • ELSE
  • -1
  • END
  • ) ASC
  • </cfquery>
  •  
  • <!--- Loop over returned records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Output the row ID and name. --->
  • <p>
  • [ #qBlog.id# ]
  • #qBlog.name#
  • </p>
  •  
  • </cfloop>

The problem with the IN clause is that it only helps to filter the result set and has no impact on the order in which the IDs are returned (by default, the records are returned in the same order in which the SQL server reads them from disk). And, since I am assuming ID order is important (since the reader's ID list was NOT in an ascending order), we need to create a fairly complicated ORDER BY clause to get the records returned in the proper order.

If the ORDER BY statement is confusing, basically what it is doing is checking the record ID against the list of IDs and is telling the SQL server to order the record based on the index of the given ID in the ID list.

Running the above code, you will see that we get the same output:

[ 6 ] FireFox and XStandard CSS File Linkage

[ 7 ] Over-Complicating Getting the Rest of List

[ 4 ] Trouble with XStandard and FireFox

[ 29 ] Getting Back Into The Gym

[ 32 ] Getting Group By Count In Left Outer Join

[ 12 ] Thanks to the Texas Web Developers

This solution is good because we are getting all of the same information as we were in the first one, but instead of 6 separate database calls, we now only have one. This will be a nice performance boost. The caveat here is that when we output the data, we are looping over a query not our ID list. The con here is that the meat of the query is way more complex that it was in the first solution. This can be distracting to the overall meaning of the query and can make debugging a pain in the butt. Plus, as the ID list gets longer and longer (not that that is going to happen), the SQL IN clause becomes less and less efficient. And of course, there is much more code here to deal with in general.

The next solution attempts to overcome the cons of the previous solution by moving the complex logic out of the main query using an intermediary, in-memory table. Using just a single query, we are going to create a temporary table that has both the ID and the position (sort index) of all the IDs in the list. Our main query is then going to just join to that temp table.

  • <!---
  • Create a list of ID's who's fields we want to
  • pull out of the database.
  • --->
  • <cfset lstID = "6,7,4,29,32,12" />
  •  
  •  
  • <!---
  • Query for all the records that correspond to
  • an ID in our ID list.
  • --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
  • <!---
  • Because we are not just dealing with a list of IDs
  • but also with their order, I think the easist thing
  • to do would be to create an in-memory table that has
  • both the ID value and the sort position. This table
  • will be called "valid" as in "valid IDs".
  • --->
  • DECLARE @valid TABLE (
  • id INT,
  • sort INT
  • );
  •  
  •  
  • <!---
  • Now that we have our table declared, let's loop over
  • the IDs and populate it.
  • --->
  • INSERT INTO @valid (
  • id,
  • sort
  • )(
  • <cfloop
  • index="intIndex"
  • from="1"
  • to="#ListLen( lstID )#"
  • step="1">
  • SELECT
  • #ListGetAt( lstID, intIndex )#,
  • #intIndex#
  • UNION ALL
  • </cfloop>
  •  
  • <!--- This it to handle the last UNION ALL. --->
  • SELECT 0,0
  • );
  •  
  •  
  • <!---
  • ASSERT: The temp table @valid now has all the IDs from
  • the original list with a SORT column value that
  • reflects the position of the ID in the original list.
  • --->
  •  
  •  
  • <!---
  • Now, select all records where we have a given ID.
  • Since we are assuming the list has no duplicates,
  • we can accomplish this one-to-one relationship via
  • an INNER JOIN to our valid ID table.
  • --->
  • SELECT
  • b.id,
  • b.name,
  • b.date_posted
  • FROM
  • blog_entry b
  • INNER JOIN
  • @valid v
  • ON
  • b.id = v.id
  • ORDER BY
  • v.sort ASC
  • </cfquery>
  •  
  •  
  • <!--- Loop over the returned records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Output the row ID and name. --->
  • <p>
  • [ #qBlog.id# ]
  • #qBlog.name#
  • </p>
  •  
  • </cfloop>

Notice here that the bulk of the logic now goes into populating the temporary table. The main query, the one that returns the ID-based records, is now much more simple. The ORDER BY clause merely uses the sort column in the temp table. The only complex thing about it is that we are filtering the records based on the INNER JOIN to the temp table rather than using any WHERE clause.

Running the above code, you will see that we get the same output:

[ 6 ] FireFox and XStandard CSS File Linkage

[ 7 ] Over-Complicating Getting the Rest of List

[ 4 ] Trouble with XStandard and FireFox

[ 29 ] Getting Back Into The Gym

[ 32 ] Getting Group By Count In Left Outer Join

[ 12 ] Thanks to the Texas Web Developers

The SQL used here is much more advanced in general, but if you assume that the temp table is being populated properly, the chances of making an error in the main query are very low. Also, since we are using a JOIN rather than in IN clause, I believe (although I am not certain) that this solution will outperform the previous one as the list gets bigger.

I think the main cons with this solution are simply that there is a good deal of code to deal with and the SQL being used is someone advanced. And, for a very small list, this might just be overkill.

Each of these solutions, I think, has it's place depending on the context in which it will be used. I am not gonna say that one of these clearly beats the other. People with more experience may be able to speak more definitively about the performance difference between solution 2 and solution 3.

Hope this helps in some way.

EDIT: Suggested by Dustin - ColdFusion Query of Query Solution

Seeing as I love ColdFusion query of queries, I can't believe I totally forgot about this as a possible solution. Here, we are getting all the possible values in an up-front query as we did in solution 2. But this time we don't care about ordering, we just want the pool of records. Then, we can loop over the list and re-query our record set for the appropriate values (notice that when outputting the data, we are referring to the subquery, not the first query).

  • <!---
  • Create a list of ID's who's fields we want to
  • pull out of the database.
  • --->
  • <cfset lstID = "6,7,4,29,32,12" />
  •  
  • <!--- Query for record. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
  • SELECT
  • b.id,
  • b.name,
  • b.date_posted
  • FROM
  • blog_entry b
  • WHERE
  • b.id IN
  • (
  • <!---
  • Pass in the ID list as a listed query param.
  • The LIST attribute will cause this to act as
  • a comma delimited list in which each value
  • gets bound to an item in the list.
  • --->
  • <cfqueryparam value="#lstID#" cfsqltype="cf_sql_integer" list="true" />
  • )
  • </cfquery>
  •  
  •  
  • <!--- Loop over the list. --->
  • <cfloop
  • index="intID"
  • list="#lstID#"
  • delimiters=",">
  •  
  • <!---
  • Query for local record using a ColdFusion Query of
  • Queries. While this is another SQL call, it does
  • not communicate with the database.
  • --->
  • <cfquery name="qBlogSub" dbtype="query">
  • SELECT
  • *
  • FROM
  • qBlog
  • WHERE
  • id = <cfqueryparam value="#intID#" cfsqltype="cf_sql_integer" />
  • </cfquery>
  •  
  • <!--- Output the row ID and name. --->
  • <p>
  • [ #qBlogSub.id# ]
  • #qBlogSub.name#
  • </p>
  •  
  • </cfloop>

Some people get mixed results with ColdFusion query of queries in terms of performance. What you save in the cost of communicating with the database server you pay for in the fact that ColdFusion query of queries performs SQL calls slower than the Database. It becomes a bit of a tweaking and balancing act to see what performs better. I am sure that this is heavily influenced by the load on the server and the number of concurrent connections allowed to the database. In general though, I would certainly suggest erring on the side of FEWER database calls.

EDIT: Suggested by Rick Osborne - Inline Temp Table

Rick Osborne suggested doing a modification of the temp table. In his example (reproduced below), he makes his initial FROM statement an inline temp table comprised of those nested SELECT statements. This is basically what I was doing in my third example; the difference here is that Rick's does not require any additional variables - it is just part of the main query.

  • <!---
  • Build an array of IDs rather than a list of IDs (We
  • can iterate over an array faster than we can a list
  • and it is more index-friendly).
  • --->
  • <cfset arrID = ListToArray( "6,7,4,29,32,12" ) />
  •  
  • <!--- Query for all records. --->
  • <cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
  • SELECT
  • ids.id,
  • b.name,
  • b.date_posted
  • FROM
  • <!---
  • Our intial table will be a dynamic table created
  • from our sub-select. This is like creating an
  • in-memory table except we don't need to declare
  • any variables.
  • --->
  • (
  • <cfloop
  • index="intIndex"
  • from="1"
  • to="#ArrayLen( arrID )#"
  • step="1">
  •  
  • <cfif (intIndex NEQ 1)>
  • UNION ALL
  • </cfif>
  •  
  • SELECT
  • <cfqueryparam value="#arrID[ intIndex ]#" cfsqltype="cf_sql_integer" /> AS id,
  • <cfqueryparam value="#intIndex#" cfsqltype="cf_sql_integer" /> AS sort
  • </cfloop>
  • ) AS ids
  • LEFT OUTER JOIN
  • blog_entry AS b
  • ON
  • b.id = ids.id
  • ORDER BY
  • ids.sort ASC
  • </cfquery>
  •  
  •  
  • <!--- Loop over the returned records. --->
  • <cfloop query="qBlog">
  •  
  • <!--- Output the row ID and name. --->
  • <p>
  • [ #qBlog.id# ]
  • #qBlog.name#
  • </p>
  •  
  • </cfloop>

Also notice that Rick went with an array of IDs rather than a list. This is a wise move for our purposes. It is faster to iterate over an array than it is a list. And, when using arrays, I think indexes feel more natural and perform quicker.




Reader Comments

Another alternative to this would be to do your second example, but instead of looping the query, loop the list and do a query of query pulling the record you need for that iteration. That should keep easy to read, give fairly good performance, and you avoid the complex query.

Reply to this Comment

@Dustin,

Excellent suggestion. I can't believe I totally spaced on the Query of Queries as a possible solution (I love QoQ!). I will try to update this at lunch time.

Reply to this Comment

When I could trust my input data, I have used dynamic SQL queries with an IN statement:

WHERE b.id IN (<cfoutput>#lstID#</cfoutput>)

The IN statement essentially just becomes a series of OR statements in the query.

Most RDBMSes enforce a maximum list size (probably around 256) and you might need to quote the elements in some lists (but it works as-is for numerics).

Reply to this Comment

@Adam,

While I'm not entirely sure about the maximum number of items you can put in the IN statement. I've never had an error tossed from this and the docs for T-SQL states " IN (,...n)" which leads me to believe there isn't a limit (at least for MS-SQL).

That aside you should really use the <cfqueryparam list="yes"...>, this way you get the benefit of using bind variables. This will help prevent any injected SQL from being executed in your list and it will also speed the transaction considerably. As an added bonus you don't have to worry about quoting the values in your list. :D

Reply to this Comment

@Dusin,

I am not sure that I have ever gotten a an error with IN and just #lstID#. However, I have gotten an error when I use CFQueryParam list=yes AND have a huge list. I think there is an upper limit to how many variables can be bound. But, this is not a limitation of the IN () clause, but rather the SQL server in general? Not sure...

Reply to this Comment

Well that *did* sound like something interesting to explore. So I decided to do some testing and I did find out something interesting. If you try to do a QoQ with cfqueryparam and a 4000 item list it crashes CF.

Ooops....

Reply to this Comment

Ha ha ha, nice :) I think if you try the same thing except directly in a SQL server query, it runs out of stack memory or comes back with an "unknown exception" or something (I can't quite remember what the deal is).

Reply to this Comment

For completeness, I should point out:

<cfset arrID=ListToArray(lstID)>
<cfquery name="qBlog" datasource="#REQUEST.DSN.Source#">
SELECT ids.id, b.name, b.date_posted
FROM (
<cfloop from="1" to="#ArrayLen(lstID)#" index="i">
<cfif i GT 1>UNION ALL</cfif>
SELECT <cfqueryparam cfsqltype="CF_SQL_INTEGER" value="#arrID[i]#"> AS id, #i# AS orderID
</cfloop>
) AS ids
LEFT OUTER JOIN blog_entry AS b ON (b.id = ids.id)
ORDER BY ids.orderID
</cfquery>

This has the upshot of not only being a single query, but also returning empty rows if you pass in a bogus ID. (In case you want to output some kind of error.) If you don't want to do that, you can always make it an INNER JOIN instead of a LEFT OUTER JOIN.

It is essentially the same as your second example with the temporary table, but it doesn't create a temporary table. So it would be more portable to smaller DBMSes.

-R

P.S. - If you are on DB2, which doesn't like SELECT without FROM, you can add "FROM sysibm.sysdummy1".

Reply to this Comment

Thanks Ben this was what I finally did to make it work, my solution wasnt to complex.

<cfoutput>
<table cellpadding="0" cellspacing="0" width="800" align="center">

<cfloop list="#theList#" index="i">
<cfquery datasource="cvccvc" name="theOne">
SELECT *
FROM Thes
Where WheelID = #i#
</cfquery>

<tr>
<td>#theOne.Bio#</td>
</tr>
</cfloop>
</table>
</cfoutput>

Reply to this Comment

@Dustin, @Rick,

I have added your two solutions to the post. Thanks for filling in the gaps.

@Wrighter,

As long as your stuff is performing well, I think that's a fine way to go. If you find that it causes an issue later, we can always optimize.

Reply to this Comment

There's another way of doing this in ColdFusion (and probably other languages). You can retrieve a specific cell from your query by using the format QueryName['ColumnName'][RowNumber].

So after your query you might do something like:

<cfloop list="#LstID#" index="i">
<cfset DesiredQueryRow = ListFind(ValueList(qBlog.ID), i) />
<cfif DesiredQueryRow>
<!--- Matching record returned... --->
<p>
[ #qBlog['id'][DesiredQueryRow]# ]
#qBlog['name'][DesiredQueryRow]#
</p>
</cfif>
</cfloop>

Reply to this Comment

When I go looking for ColdFusion answers, I always end up here.

Ben, thanks again for another great explanation and example.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.