In the ColdFusion Weekly Podcast, version 2.13, the ColdFusion quiz involved picking N (or rather less than N) elements from an array in a random order. Here is the CFQuiz description:
Assume you have an array with n number of elements, and you want to retrieve a random number of elements from this array. The random number you want to retrieve is less than n, but greater than 1, and you cannot retrieve the same element twice. So for example, let's say your array has 10 elements in it and you want to retrieve 5 random elements, but none of these 5 can duplicate one another. How would you do that?
I thought this was an excellent quiz choice and I want to discuss my answer(s) here as they are not as easily viewed on the ColdFusion weekly web site (does that give me a disadvantage?).
When it comes to choosing random elements from an array, I can think of three different algorithms:
My gut feeling tells me that each of these methods (explored in more detail below) have use cases in which they are more or less efficient. The thing I wanted to test was the efficiency of each algorithm as the number of desired elements gets closer to N (the total number of elements in the array).
To help test these different algorithms, I wrapped each in a ColdFusion component and gave it a common interface:
This method accepts the target array. Remember that arrays are passed by VALUE in ColdFusion so each ColdFusion component instance will get its very own copy of the array (important for destructive algorithms).
This method returns one or more random elements from the array. If only one element is requested (optional, default argument value), then only a single value is returned. If more than one element is requested, then those elements are put in an array and that array of values is returned. If more elements are requested than there are available in the target array, an error is thrown.
This method returns the number of elements in the target array that have not yet been selected.
Before we get into the testing of efficiency, let's review the three different random select algorithms.
This algorithm works by choosing a random element from the array. That index is then deleted from the array and the removed value is returned. By doing this, we can ensure that no two index-values are chosen as they are removed from the array after first use:
Launch code in new window » Download code as text file »
This method does not do any work up-front, but it does require calculations for every single value selection.
This algorithm works by shuffling the array the second the ColdFusion component is initialized. Once the array is shuffled, the Get() method merely iterates over the shuffled array and returns values in order (in shuffled order that is):
Launch code in new window » Download code as text file »
This methods does a lot of work up front to shuffle the array, but then, it requires no calculations for subsequent value selections.
This algorithm works by keeping an index of selected index values. For every selection that gets made, it chooses a random index and checks to see if it has already been selected (and if so, it will continue to select random indexes until an unused one is found):
Launch code in new window » Download code as text file »
This algorithm does no work up front, but it does increasingly more work (potentially) for every subsequent value selection. The reason we are doing this is to see if the ArrayDelete() in the first algorithm comes at a heavier cost than the repeated random selection and index look ups.
Ok, so now that we have our algorithms down, it's time to test them. Before we do anything at all, we have to test to make sure that our algorithms work without regard to efficientcy. Therefore, we must run some very small use cases:
Launch code in new window » Download code as text file »
In the previous test, we were just making sure that there were no blatant errors. We were also checking the Get() method to see if it could handle single and multiple value selects (not required in the quiz, but a cool feature nonetheless). These worked fine (and I will spare you the output as this is a rather long post).
Now that we know these algorithms work in so much as they do what they were defined to do, we can now test to see how they perform. First, we will test "worst case scenarios" in which we have to select N random values from an array of length N. And, since ColdFusion is sooo awesome, we have to use a fairly large array to even see time differences:
Launch code in new window » Download code as text file »
Running the above tests, we started to see a clear trend in seconds to execute:
10.8, 10.7, 10.8, 11.5, 10.8
Average: 10.9
11.7, 10.8, 10.8, 10.8, 10.8
Average: 11.0
13.0, 12.8, 13.5, 12.9, 13.5
Average: 13.1
I have to say that I was blown away by the performance of all three of these algorithms. The first two trended to be slightly better, but all three were lighting fast. I was also blown away that the up front Shuffle() method was just as fast as first method that merely deleted selected values. I had no idea that java.util.Collections Shuffle() method was so freakin' efficient.
In all reality, though, we rarely ever have to deal with worst case scenarios. So, for the final test, I wanted to test a "better case scenario" in which we were selecting multiple values from a massive array, but no where near the number of possible values:
Launch code in new window » Download code as text file »
Running the above test, we started to see a clear trend in milliseconds to execute:
15, 16, 31, 15, 00
Average: 15.4
00, 31, 16, 16, 15
Average: 15.6
16, 15, 00, 00, 00
Average: 6.2
Again, all three are lighting fast, but it seems for a better/best case scenario, the Select and Track method was the best. That makes sense as the third algorithm does no up front work and does very little work early in the value selection process. I was again blown away at how fast an array of 10,000 elements can be shuffled (algorithm 2).
So in conclusion, it makes little to no difference which algorithm you use. I like the Shuffle() method, but I think that is more of a cool factor than anything else.
Download Code Snippet ZIP File
Comments (6) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Maintaining Sessions Across Multiple ColdFusion CFHttp Requests
SQL Server Text Matching Is Case INSENSITIVE
Really enjoyed this post. got me thinking!
Posted by Joe on May 23, 2007 at 2:47 PM
Thanks Joe, I am glad to hear it.
Posted by Ben Nadel on May 23, 2007 at 2:51 PM
Hi Ben--sorry about providing yours as a zip, but it seemed like the expeditious thing to do at 2 a.m. :-) I hope that doesn't give you a disadvantage.
Posted by Matt Woodward on May 23, 2007 at 4:58 PM
@Matt,
No worries. I was joking about the disadvantage - I actually put that in during the proof read, not in the original go at the blog entry :) It's just a fun contest. Coming up with a solution is more fun that winning. Plus, look at this blog entry.... how could have possible put that up on your site :)
Posted by Ben Nadel on May 23, 2007 at 5:11 PM
Ben opinions coincide for us :)
Posted by Back pack on May 24, 2007 at 11:17 AM
who did have such question interestingly? to decide him by such method! :)
Posted by Acacia on May 31, 2007 at 8:02 PM