Using ColdFusion Structures To Remove Duplicate List Values

Posted December 13, 2006 at 7:37 AM

Tags: ColdFusion

The other week, I posted a solution for Ray Camden's Friday Puzzler for Santa's list. It involved using the ColdFusion structure (struct) as a utility to clean lists. I thought it worked quite nicely and thought that I would talk about it here quickly. The beauty of the ColdFusion struct is that it is an object that maps keys to values. This mapping feature forces the struct to only have unique keys as there would be no way to map two duplicate keys to two different values. This mapping can be leveraged to clean the list of duplicate values.

To demonstrate this, let's build a list that has duplicate values:

 Launch code in new window » Download code as text file »

  • <!--- Create a group values that have duplicate values. --->
  • <cfsavecontent variable="strListData">
  • Ghost Busters
  • What About Bob
  • Broken Flowers
  • What About Bob
  • Lost In Translation
  • Stripes
  • Scrooge
  • What About Bob
  • Lost In Translation
  • </cfsavecontent>
  •  
  •  
  • <!--- Create a list based on the list data. --->
  • <cfset lstMovies = strListData.Trim().ReplaceAll(
  • "[\r\n\t]+",
  • ","
  • ) />

This is a bit of a round-about way to create a list, but I am doing this for readability (so you can really see the list). If we output the list above:

 Launch code in new window » Download code as text file »

  • <!--- Break list on to different lines. --->
  • #lstMovies.ReplaceAll( ",", "<br />" )#

... we get:

Ghost Busters
What About Bob
Broken Flowers
What About Bob
Lost In Translation
Stripes
Scrooge
What About Bob
Lost In Translation

Now, we can loop over the list and add them to the ColdFusion struct. Remember, due to the mapping nature of the struct, duplicate keys will override each other, thereby removing duplicates:

 Launch code in new window » Download code as text file »

  • <!--- Create a structure to hold the movie titles. --->
  • <cfset objMovies = StructNew() />
  •  
  • <!--- Loop over the list to add mappings to the struct. --->
  • <cfloop index="strMovie" list="#lstMovies#" delimiters=",">
  •  
  • <!---
  • Store key/value pair. In this case, we don't
  • really care about the value (hence storing ""
  • as value). We only care about the key.
  • --->
  • <cfset objMovies[ strMovie ] = "" />
  •  
  • </cfloop>
  •  
  • <!--- Convert the now unique key values into a list. --->
  • <cfset lstMovies = StructKeyList( objMovies ) />

Now, outputting that list:

 Launch code in new window » Download code as text file »

  • <!--- Break list on to different lines. --->
  • #lstMovies.ReplaceAll( ",", "<br />" )#

... we get:

Lost In Translation
What About Bob
Stripes
Scrooge
Ghost Busters
Broken Flowers

Notice that this time, the duplicates have been removed. Notice also that the order of the list has been changed. That usually won't matter as I don't think you should require proper ordering in a list that is required to remove duplicate values.

I am not sure how this compares (performance-wise) to the loop/check strategy that constantly checks for duplicates. I almost want to say that this would be faster as there really is no logic involved. There's no CFIF statements, no conditionals; all we do is set values and then join them together. It's that easy.

One final note: this is not case sensitive. The structure, like most all ColdFusion, treats mixed cases as the same value. If you want to do this type of algorithm with case sensitivity, you can create a Java 2 Hashtable and use it in similar way. I won't go into that though, as that is not something any of us really need to do (or otherwise, we probably shouldn't be using lists perhaps?).

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Dan Wilson
Dec 13, 2006 at 9:09 AM // reply »
35 Comments

Very Clean!

There are other ways to remove duplicates in a list, but none as elegant as this.

Good Show!

Dan Wilson


Dec 13, 2006 at 9:24 AM // reply »
6,371 Comments

Dan,

Glad you like. Always happy to share.


Daniel Roberts
Dec 13, 2006 at 10:05 AM // reply »
27 Comments

Thanks! I've been looking for a way to speed up the creation of a unique 5 character id list I create for emails tracking in one of our products. The best way I could find to keep the list unique was just using string contains on a list but your struct alternative is MUCH faster. I'm still checking my code but here are some figures:

2,000 IDs
- List/String = 1797ms
- Struct = 171ms

10,000 IDs
- List/String = 41750ms
- Struct = 937ms


Dec 13, 2006 at 10:10 AM // reply »
6,371 Comments

Daniel,

That's awesome. I had a gut feeling tell me it would be faster, but it's nice to see some numbers on it.


Daniel Roberts
Dec 13, 2006 at 10:22 AM // reply »
27 Comments

100,000 IDs
- List/String (not going to do it now, tried in the past and its like 20 minutes)
- Struct = 50266ms (WOW!)

I have it flushing counts every 100 ID attempts and it pauses around 36800 and 74900 every time so there is some type of conversion going on around those sizes. I noticed similar pauses occurring when using strings.

The algorithm is close to linear instead of growing in time based on the size of the list. Don't remember all that Big O notation stuff to explain that further.


dc
Dec 13, 2006 at 2:34 PM // reply »
1 Comments

I'm no regexp ninja but would you not be better just using regular expressions?

nothing at hand at the moment to show for code, but thought i'd chuck that in the ring


Dec 13, 2006 at 2:38 PM // reply »
6,371 Comments

DC,

Regular expressions are totally awesome, but they require a lot of parsing and processing. And, the larger the string, the longer it takes in what seems a non-linear fashion... that's not from official testing, that's just what I have noticed.

That, and I wouldn't even know how to start writing that expression.


Dec 13, 2006 at 5:13 PM // reply »
111 Comments

Hey Ben,

VERY cool! Very cool indeed!


Dec 13, 2006 at 5:35 PM // reply »
6,371 Comments

Pete,

Glad you like. One thing to note as well is that you can do StructKeyArray() as well if you need to clean an array rather than a list. This will just return the struct-keys into a one-dimensional array.


Tim
Jul 27, 2007 at 12:25 AM // reply »
10 Comments

Now how about an easy way to tell *which* elements of the list are duplicated?


Dan Roberts
Jul 27, 2007 at 12:33 PM // reply »
27 Comments

The regular expression idea interests me. I'm curious about the efficiency of checking for duplicates across a list in a single regular expression search. Not as efficient as structs for deduping but if you have to find dupes it may be faster than looping for every item in list.

First you would need to sort the list, then run a regular expression such as the following "(,|$)([^,]+),\2(,|^)".

Just remembered though that REFind only returns the first match so that would require some looping to find the next match. hmmm


Jul 27, 2007 at 2:04 PM // reply »
6,371 Comments

@Dan,

Regular expression are totally awesome... but you want to avoid having to parse a string too many times. I would guess that any kind of looping in conjunction with regex is gonna be slower than using arrays and structs.


Dan Roberts
Jul 27, 2007 at 2:43 PM // reply »
27 Comments

I'm not actually advocating a loop through each piece of data, rather an REFind on an entire list string using the delimiter in regular expression to detect dupes. The benefit with a regular expression in this instance over an array loop is that you are only looping as many times as there are matches and you are able to start where you left off.

For example with the regular expression, if there are no duplicates you would do a single REFind which would find no duplicates and you would be done. If it found one you would start another REFind from where you found the first dupe to look for the next.

A loop over each item in the list or array on the other hand would require an action on ever item in the list to compare to every other item in the list. That quickly looses efficiency because processing increases exponentially, which was the need for the struct work around of course.

Anyways, something to think about and I might play with when I have time.


Jul 28, 2007 at 5:25 PM // reply »
6,371 Comments

@Dan,

I this is interesting. I would certainly love to see what you come up with if you have time play around with the idea. I am always looking to learn new techniques.


Matt
Feb 27, 2008 at 8:30 PM // reply »
1 Comments

Though the RE method may help us find duplicates be it tell us they exist or tell us which are the duplicates the original purpose of this post was to REMOVE duplicates not to find them.

For it's purpose this works well enough in comparison to the alternatives out there which serve ONLY this purpose and no extra.

As a thought should you find your data in a Query object (say you pulled a list from you database and you only want the unique entries for a particular column where the db does not have this constraint imposed) a QofQ may work and scale far better than this method. Where the method here is linear (O(n)) for those that know the notation it is possible the QoQ method may be < O(n) for larger datasets.


Jas Kaur
Apr 23, 2009 at 6:35 AM // reply »
1 Comments

Absolutley love this, have used the technique a few times!


rich
May 13, 2009 at 2:30 PM // reply »
3 Comments

How about a way to count the occurrences of each value in the list, before or during the conversion to a struct key list?


May 14, 2009 at 3:18 PM // reply »
11 Comments

This is a great approach! Thanks for laying out the logic so easily and cleanly. I remember relying on a UDF from cflib.org that also removed duplicates. This one would loop through each list item and add it to a second list only if it hadn't been seen before. As the second list gets larger then searching gets slower. The method shown here would definitely be faster as others have already pointed out. Shoot, Adobe might as well take the code here and make a built-in function out of it. I can dream, can't I?


rich
May 15, 2009 at 11:23 AM // reply »
3 Comments

I wrote this to count the occurrences of each item when looping a populated list with redundancies into a struct. Worked pretty well. Thanks for the original idea!! :-)

<cfloop index="tag" list="#tagList#" delimiters=",">
<cfset tag1 = trim(#tag#)>
<cfif tag1 neq "">
<cfif structKeyExists(tagStruct,"#tag1#")>
<cfset tagStruct[tag1] = #tagStruct[tag1]# + 1>
<cfelse>
<cfset tagStruct[tag1] = "1" />
</cfif>
</cfif>
</cfloop>


May 19, 2009 at 9:30 AM // reply »
6,371 Comments

@Rich,

Looks good - always glad to provide any inspiration. As a quick note, you don't need to use the # sign so much. It is only needed when a variable is being evaluated *inside* a string.


Aug 25, 2009 at 11:49 PM // reply »
3 Comments

Ben,

Great post! Used it on a couple different projects in the past week! Thanks!!


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 7, 2009 at 5:53 PM
Ask Ben: Javascript String Replace Method
You can find here an advanced function that prepared with javascript replace function. This can make the first letters of words, sentences, lines and whatever you define automatically: http://www.m ... read »
Andrew Neely
Nov 7, 2009 at 4:56 PM
A Moment That Touched Me - The Fountainhead
Ben, Glad you enjoyed the podcast. Yeah, the Tank Riot guys can get really chatty during the episodes, but that's part of the charm of it for me. They've covered everything from Nichola Tesla to Cha ... read »
Nov 7, 2009 at 4:43 PM
Building A Fixed-Position Bottom Menu Bar (ala FaceBook)
Is it possible to make some more MenĂ¼`s ? ... read »
Jill
Nov 7, 2009 at 11:40 AM
How To Unformat Your Code (Like A Pro)
Derek, I think you might be right - sweet! Thanks for the link :) ... read »
Nov 7, 2009 at 11:25 AM
How To Unformat Your Code (Like A Pro)
I think it would be way easier to just use this http://www.logichammer.com/html-formatter/ He just released v3 and it rocks. ... read »
Jill
Nov 7, 2009 at 7:58 AM
How To Unformat Your Code (Like A Pro)
LMAO - this was pretty funny! I have to admit - I also love to reformat code so I can read it. My boss used to tell me to leave my OCD at home. Now I don't feel so bad after reading everyone else' ... read »
Nov 6, 2009 at 10:10 PM
How To Unformat Your Code (Like A Pro)
The timing of this post is just uncanny. I spent the last 15-20 minutes manually un-formatting my "Ben Nadel" style code within a CFC of mine. I was really digging the readability a few weeks ago, bu ... read »
Roe
Nov 6, 2009 at 5:11 PM
Passing Arrays By Reference In ColdFusion - SWEEET!
ArraySort also reorders the results of these java obj's ... read »