Ask Ben: Removing Duplicate List Items While Maintaining Original List Order

Posted January 15, 2009 at 9:50 AM by Ben Nadel

Tags: ColdFusion, Ask Ben

Do you know a way to remove duplicates from a list while keeping it sorted a-z? I have a list with the values I want and sorted correctly, but I have yet to find a way to sort it correctly. Any help would be great! Thanks!!

A while back, I described a very simple, elegant solution for using ColdFusion structs to remove duplicate values from a list. While nice, the limitations of that algorithm were that it did not ensure case sensitivity of the list items (depending on how you coded it) and it did not maintain any ordering of the original list. For the most part, I work with lists of IDs, so that was not a problem.

If we want to maintain the case sensitivity and ordering of the original list, we just need to add a little more logic to this idea. We can still use the ColdFusion struct as a look-up for our unique values; but, we will no longer use the struct to render our final list of unique values:

  • <!--- Create a list with duplicate values. --->
  • <cfset strList = "1,1,2,3,4,1,3,5,5,1,9,0,1" />
  •  
  • <!---
  • Create an look up index. This is a struct whose only
  • purpose is to provide a fast key-lookup for existing values.
  • --->
  • <cfset objExistingKeys = {} />
  •  
  • <!---
  • We are going to create a new array to hold our "new" list
  • values. This will be our ordered copy of the list with only
  • unique values.
  • --->
  • <cfset arrNewList = [] />
  •  
  • <!---
  • Now, let's loop over the existing list and build our new
  • list IN ORDER with only unique values.
  • --->
  • <cfloop
  • index="strItem"
  • list="#strList#"
  • delimiters=",">
  •  
  • <!--- Check to see if this item has been used already. --->
  • <cfif NOT StructKeyExists( objExistingKeys, strItem )>
  •  
  • <!---
  • This list item is not a key in our look up index
  • which means that it has NOT been used in our new
  • list yet. Let's add it to our new list array.
  • --->
  • <cfset ArrayAppend( arrNewList, strItem ) />
  •  
  • <!---
  • Add the item to our look up index so that we don't
  • use it again on further loop iterations.
  • --->
  • <cfset objExistingKeys[ strItem ] = true />
  •  
  • </cfif>
  •  
  • </cfloop>
  •  
  • <!---
  • Now that we have copied over all the unique values in
  • order to our new list, let's collaps the array down into
  • a string list.
  • --->
  • <cfset strNewList = ArrayToList( arrNewList ) />
  •  
  • <!--- Output the new list. --->
  • #strNewList#

As you can see, as we loop over our existing list, we are building a parallel list (using an array) with the unique values. Since both lists and arrays have inherent ordering, our use of ArrayAppend() maintains the order of the original list. Every time we hit a new unique value, we add it to our struct-look-up table to make sure that we never copy a duplicate value over to the new list.

When we run the above code, we get the following output:

1,2,3,4,5,9,0

The list is now composed of unique values in the original order. I am using an array and ArrayAppend() to build the new list rather than ListAppend() only because I have found arrays to be faster than string concatenation. Of course, for small lists, there would be zero difference in performance. I hope that helps.




Reader Comments

Jan 15, 2009 at 9:56 AM // reply »
1 Comments

Thanks Ben, exactly what I needed!!


Jan 15, 2009 at 10:04 AM // reply »
28 Comments

I use your structure dedupe idea and then just do a listsort to get the sorting correctly.


Jan 15, 2009 at 10:05 AM // reply »
11,238 Comments

@Adam,

Awesome man, glad to help.

@Aaron,

That's a good idea as well. I guess when I read the question, I ignored the "a-z" part and just thought about the order of the original list. Nice catch.


Jan 15, 2009 at 10:59 AM // reply »
12 Comments

Very very nice. I was a bit confused though when I read "The list is not composed of unique values in the original order" (emphasis added), and it took me a while to realize that you meant "... now composed ...." I guess I'm not quite awake yet.


Jan 15, 2009 at 11:02 AM // reply »
11,238 Comments

@Matt,

Ooops. Good catch - typo fixed.


Jan 15, 2009 at 11:06 AM // reply »
8 Comments

how about this:

// list with duplicate values.
strList = "1,1,2,3,4,1,3,5,5,1,9,0,1";

// new list to the copy that wont have dups
strNewList = "";

while (listLen(strList)) {
if (!listFind(strNewList,listFirst(strList))) strNewList = listAppend(strNewList,listFirst(strList));
strList = listRest(strList);
}

<cfoutput>#strNewList#</cfoutput>


Jan 15, 2009 at 11:09 AM // reply »
8 Comments

You know, I just posted a method that uses listAppend instead of arrayAppend, and now I realize that you explained why you used the array instead of the list. Doh! My bad. You can delete my post if you like.


Jan 15, 2009 at 11:23 AM // reply »
153 Comments

For your "How NOT to do it" file:

<cfset strList="1,1,2,3,4,1,3,5,5,1,9,0,1">
<cfset q=queryNew("item,position")>
<cfloop list="#strList#" index="j">
<cfset queryAddRow(q)>
<cfset querySetCell(q, "item", j, q.recordCount)>
<cfset querySetCell(q, "position", q.recordCount, q.recordCount)>
</cfloop>
<cfquery name="r" dbtype="query">
SELECT item, MIN(position) as firstPosition
FROM q
GROUP BY item
</cfquery>
<cfquery name="s" dbtype="query">
SELECT q.item, q.position
FROM q, r
WHERE (q.item = r.item) AND (q.position = r.firstPosition)
ORDER BY q.position
</cfquery>
<cfset strNewList=valueList(s.item)>


Jan 15, 2009 at 11:35 AM // reply »
11,238 Comments

@Chris,

No worries my man. For small lists, the speed differences are not going to be noticeable.

@Rick,

If you want, I can set that algorithm up as a web service on my server and then you can modify your example to use CFHTTP to pass in the list and I'll return the unique version ;)


Jan 15, 2009 at 12:48 PM // reply »
5 Comments

I could not resist -
Here is a version without cfloop based on java
It will allow you to retain the original list sort or to sort any way you want.

It is based on posts I have read from:
Adrian J. Moreno and Ben Nadel
Links to those posts are at the bottom

<cfset strList = "1,1,2,3,4,1,3,5,5,1,9,0,1" />
<cfset newListArray = listToArray( strList ) />
<cfset lhs = createObject("java", "java.util.LinkedHashSet") />
<cfset lhs.init( newListArray ) />

<!--- Clear out the original elements --->
<cfset newListArray.clear() />

<!--- Add all of the unique elements back from lhs --->
<cfset newListArray.addAll( lhs ) />

<!--- If you want to sort the array --->
<!--- either with Java --->
<cfset CreateObject( "java", "java.util.Collections" ).Sort( newListArray ) />
<!--- or with CF --->
<!--- <cfset temp = arraySort( newListArray,"numeric","asc") > --->

<!--- show the clean list --->
<cfoutput>#arrayToList( newListArray )#</cfoutput>

source posts:
http://www.iknowkungfoo.com/blog/index.cfm/2008/10/22/Remove-duplicate-list-or-array-elements-using-ColdFusion-and-Java
http://www.bennadel.com/blog/280-Randomly-Sort-A-ColdFusion-Array-Updated-Thanks-Mark-Mandel.htm


Jan 15, 2009 at 1:18 PM // reply »
12 Comments

Or ...
http://wiki.openbluedragon.org/wiki/index.php/ListRemoveDuplicates()


Jan 15, 2009 at 1:20 PM // reply »
12 Comments

Didn't like the () at the end of the last URL I posted, so let's try this:
http://tinyurl.com/a4oy3u


Jan 15, 2009 at 1:24 PM // reply »
153 Comments

Ben said: "you can modify your example to use CFHTTP to pass in the list"

Well played, sir. Well played.


db
Jan 15, 2009 at 1:44 PM // reply »
4 Comments

i did this for a couple of lists, just a little differently, but the results i think are the same. the lists were css and js files to be written to the document head, so the original order had to be maintained and the first occurrence of a duplicate had to remain. i did it by looping the list from end to start and setting the struct value to the index. then a numeric sort on the struct kept the original order. if you see any place this might fail, i'd be grateful to hear it.

<cfset objNoDupes = structNew()>

<cfloop from="#listlen( strList )#" to="1" step="-1" index="i">
<cfset objNoDupes[ listgetat( strList, i ) ] = i>
</cfloop>

<cfset strNewList = ArrayToList( structSort(objNoDupes, 'numeric') )>


Jan 15, 2009 at 4:11 PM // reply »
11,238 Comments

@Jeff,

Good links. I always like learning about more Java-based methods (referring to the link that is not mine - Im not that big headed :)).

@Matt,

Ah, that makes it too easy.

@DB,

I don't think that one can maintain (or guarantee) the list order.


Jan 15, 2009 at 7:52 PM // reply »
19 Comments

this is very similar to chris' example, but here it is anyway, and keeps order as well.

sz_List = '1,2,3,2,2,3,2,2,345,4324,23';
sz_ListNew = '';
for (i=1; i<= listLen(sz_List ); i++){
if(listFindNoCase(sz_listNew, listGetAt(sz_List , i)) == 0 )
sz_listNew = listAppend(sz_listNew, listGetAt(sz_List,i));
}


Jan 15, 2009 at 8:31 PM // reply »
2 Comments

CFLib has a couple of similar functions. You would need to reorder by a separate function thogh.

http://cflib.org/udf/ListDeleteDuplicates
http://cflib.org/udf/ListDeleteDuplicatesNoCase


Jan 16, 2009 at 5:18 PM // reply »
19 Comments

This method wont work if you need case sensitivity, since the struct key bins are case insensitive.


Jan 19, 2009 at 6:24 PM // reply »
11,238 Comments

@Jon,

This is true. If you need case-sensitivity, I suppose you could go with ListFindNoCase() on the new list rather than using a struct. Not sure if the Java method that @Jeff mentions above will be case-sensitive.


Jan 20, 2009 at 7:23 AM // reply »
5 Comments

@Ben, @Jon

Yes in the current format the Java solution is case sensitive.
The CF arraySort() will also sort the array "text" or "textnocase" so that you can really control the results should you want to sort them.


Feb 2, 2009 at 4:48 AM // reply »
14 Comments

Nice.
once again , you saved me a bunch of time with this. thank yoouuu sir!


Dec 10, 2009 at 11:59 AM // reply »
2 Comments

Awesome! Nice time saver and keeps the UI happy :) I had to drop this into a CF7 site and the only changes were StructNew() and ArrayNew(1) when creating, and it works great on CF7 (and I would assume before) too!


Dec 13, 2009 at 5:11 PM // reply »
11,238 Comments

@Andrew,

Awesome, glad to help.


Aug 12, 2010 at 1:52 PM // reply »
8 Comments

Thanks once again, Ben. This came in handy today, saved me a little stress.


Aug 12, 2010 at 1:59 PM // reply »
11,238 Comments

@SuperAlly,

Boo ya!


Nov 21, 2010 at 3:22 PM // reply »
1 Comments

Is it possible this solution doesn't works for Coldfusion MX7?

I get an error:

Invalid token '{' found on line 229 at column 82.

On the line: <cfset objExistingKeys = {} />


Nov 22, 2011 at 4:29 PM // reply »
1 Comments

Might consider trim()ing up those structure keys at creation, as "Adobe Digital Editions"," Adobe Digital Editions" be not equal (of course). Chasing my tail, for bit I was. :)

Bob


Dec 1, 2011 at 8:12 AM // reply »
19 Comments

Can be done in one readable line using Java (discounting the list setup below):

someList = "blah,blah,fubar,blah,fubar,hello,world";

uniqueArray = createObject("java", "java.util.LinkedHashSet").init(ListToArray(someList )).toArray();

The uniqueArray variable is now an array with the order maintained and duplicates removed.


Apr 23, 2012 at 3:07 PM // reply »
2 Comments

Thanks Ben, this was exactly what I was looking for!



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 19, 2013 at 2:31 PM
My Experience With AngularJS - The Super-heroic JavaScript MVW Framework
It's funny really just how well that image describes the way I would imagine most people that go with angular for some project is. I have had a similar roller-coaster ride with it as well, but not qu ... read »
May 17, 2013 at 7:42 PM
HashKeyCopier - An AngularJS Utility Class For Merging Cached And Live Data
Ben - thanks so much for posting these Angular articles and findings, they've been a huge help towards learning one of the more 'complex' JavaScript frameworks out there (IMO). I have been using Angu ... read »
May 16, 2013 at 5:01 PM
UPDATE: Parsing CSV Data Files In ColdFusion With csvToArray()
Your code was the closest thing I've found to obtaining some direction for converting ISO fields to values that CF can translate properly. Thank you for posting! ... read »
May 15, 2013 at 10:37 PM
Very Simple Pusher And ColdFusion Powered Chat
hi id making plz easy ... read »
May 15, 2013 at 6:07 PM
Making SOAP Web Service Requests With ColdFusion And CFHTTP
Ben, you once again saved my bacon at work. Thank you, thank you, thank you! ... read »
May 15, 2013 at 4:15 PM
What If All User Interface (UI) Data Came In Reports?
@Josh, Thanks! @Ben, I definitely recommend the David West book "Object Thinking" I've been quoting from. It goes deeply into the philosophy and history of OO programming. His breadth ... read »
May 15, 2013 at 11:36 AM
Ask Ben: Print Part Of A Web Page With jQuery
I found this helpfull when you need to keep (refresh) the original parent page after closing the iframe child print dialog (Hoping you're not using a form at this time so it won't submit again): On ... read »
May 14, 2013 at 7:13 PM
What If All User Interface (UI) Data Came In Reports?
@Jonah, If there's any books you'd recommend on the subject of domain modelling, I'd love to hear it. I just downloaded the free PDF of "Domain Driven Design Quickly". Figured I'd give it ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools