CFQuiz 2.13: Selecting N Random Elements From An Array In ColdFusion
In the ColdFusion Weekly Podcast, version 2.13, the ColdFusion quiz involved picking N (or rather less than N) elements from an array in a random order. Here is the CFQuiz description:
Assume you have an array with n number of elements, and you want to retrieve a random number of elements from this array. The random number you want to retrieve is less than n, but greater than 1, and you cannot retrieve the same element twice. So for example, let's say your array has 10 elements in it and you want to retrieve 5 random elements, but none of these 5 can duplicate one another. How would you do that?
I thought this was an excellent quiz choice and I want to discuss my answer(s) here as they are not as easily viewed on the ColdFusion weekly web site (does that give me a disadvantage?).
When it comes to choosing random elements from an array, I can think of three different algorithms:
Select and Delete
Shuffle and Select
Select and Track
My gut feeling tells me that each of these methods (explored in more detail below) have use cases in which they are more or less efficient. The thing I wanted to test was the efficiency of each algorithm as the number of desired elements gets closer to N (the total number of elements in the array).
To help test these different algorithms, I wrapped each in a ColdFusion component and gave it a common interface:
Init( Array )
This method accepts the target array. Remember that arrays are passed by VALUE in ColdFusion so each ColdFusion component instance will get its very own copy of the array (important for destructive algorithms).
Get( Count )
This method returns one or more random elements from the array. If only one element is requested (optional, default argument value), then only a single value is returned. If more than one element is requested, then those elements are put in an array and that array of values is returned. If more elements are requested than there are available in the target array, an error is thrown.
Size()
This method returns the number of elements in the target array that have not yet been selected.
Before we get into the testing of efficiency, let's review the three different random select algorithms.
Select And Delete
This algorithm works by choosing a random element from the array. That index is then deleted from the array and the removed value is returned. By doing this, we can ensure that no two index-values are chosen as they are removed from the array after first use:
<cfcomponent | |
hint="Returns random elements from an array."> | |
<!--- | |
Run the pseudo constructor to set up default | |
values and data structures. | |
---> | |
<!--- | |
Set up an instance strucure to hold our | |
instance-related data. | |
---> | |
<cfset VARIABLES.Instance = StructNew() /> | |
<!--- | |
Create a default data array. This will be | |
overridden by the constructor. | |
---> | |
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) /> | |
<cffunction | |
name="Init" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns an initialized component with the given data."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Data" | |
type="array" | |
required="true" | |
hint="The array from which we will be getting random data." | |
/> | |
<!--- Store the passed in array. ---> | |
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data /> | |
<!--- Return This reference. ---> | |
<cfreturn THIS /> | |
</cffunction> | |
<cffunction | |
name="Get" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned)."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Count" | |
type="numeric" | |
required="false" | |
default="1" | |
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned." | |
/> | |
<!--- Set up local scope. ---> | |
<cfset var LOCAL = StructNew() /> | |
<!--- | |
Check to see if we have enough values to | |
return a full value array. | |
---> | |
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT ARGUMENTS.Count)> | |
<!--- Throw an error. ---> | |
<cfthrow | |
message="You have requested more values than are available in the original array." | |
type="OutOfBounds.Exception" | |
/> | |
</cfif> | |
<!--- Set up the array to return. ---> | |
<cfset LOCAL.Values = ArrayNew( 1 ) /> | |
<!--- | |
Loop over the value count to populate the return | |
value array. Even if the user has requested only one | |
value, we are going to build an array for multiple | |
values. This will reduce our overall logic. | |
---> | |
<cfloop | |
index="LOCAL.ValueIndex" | |
from="1" | |
to="#ARGUMENTS.Count#" | |
step="1"> | |
<!--- Pick a random index out of the array. ---> | |
<cfset LOCAL.Index = RandRange( | |
1, | |
ArrayLen( VARIABLES.Instance.Data ) | |
) /> | |
<!--- Add this value to the return value array. ---> | |
<cfset ArrayAppend( | |
LOCAL.Values, | |
VARIABLES.Instance.Data[ LOCAL.Index ] | |
) /> | |
<!--- | |
Delete this index from the original array so | |
that we can never pick this value again. | |
---> | |
<cfset ArrayDeleteAt( | |
VARIABLES.Instance.Data, | |
LOCAL.Index | |
) /> | |
</cfloop> | |
<!--- | |
When returning the values, we do not want to return | |
the array if only a single element was requested. | |
Check to see what the original count request was. | |
---> | |
<cfif (ARGUMENTS.Count EQ 1)> | |
<!--- | |
Only one value was requested, so just return | |
the first value of our values array. | |
---> | |
<cfreturn LOCAL.Values[ 1 ] /> | |
<cfelse> | |
<!--- Return the entire value array. ---> | |
<cfreturn LOCAL.Values /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="Size" | |
access="public" | |
returntype="numeric" | |
output="false" | |
hint="Returns the number of random value remaining."> | |
<!--- | |
Since we are deleting array entires as we use | |
them, the number of values left is equivalent | |
to the size of our internal data array. | |
---> | |
<cfreturn ArrayLen( VARIABLES.Instance.Data ) /> | |
</cffunction> | |
</cfcomponent> |
This method does not do any work up-front, but it does require calculations for every single value selection.
Shuffle And Select
This algorithm works by shuffling the array the second the ColdFusion component is initialized. Once the array is shuffled, the Get() method merely iterates over the shuffled array and returns values in order (in shuffled order that is):
<cfcomponent | |
hint="Returns random elements from an array."> | |
<!--- | |
Run the pseudo constructor to set up default | |
values and data structures. | |
---> | |
<!--- | |
Set up an instance strucure to hold our | |
instance-related data. | |
---> | |
<cfset VARIABLES.Instance = StructNew() /> | |
<!--- | |
Create a default data array. This will be | |
overridden by the constructor. | |
---> | |
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) /> | |
<!--- | |
This will keep track of which index we last | |
retrieved from the data array. | |
---> | |
<cfset VARIABLES.Instance.Index = 0 /> | |
<cffunction | |
name="Init" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns an initialized component with the given data."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Data" | |
type="array" | |
required="true" | |
hint="The array from which we will be getting random data." | |
/> | |
<!--- Store the passed in array. ---> | |
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data /> | |
<!--- | |
Shuffle the data so that we start off with | |
a completely randomized order. | |
---> | |
<cfset CreateObject( "java", "java.util.Collections" ).Shuffle( | |
VARIABLES.Instance.Data | |
) /> | |
<!--- Return This reference. ---> | |
<cfreturn THIS /> | |
</cffunction> | |
<cffunction | |
name="Get" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned)."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Count" | |
type="numeric" | |
required="false" | |
default="1" | |
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned." | |
/> | |
<!--- Set up local scope. ---> | |
<cfset var LOCAL = StructNew() /> | |
<!--- | |
Check to see if we have enough values to | |
return a full value array. | |
---> | |
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (VARIABLES.Instance.Index + ARGUMENTS.Count))> | |
<!--- Throw an error. ---> | |
<cfthrow | |
message="You have requested more values than are available in the original array." | |
type="OutOfBounds.Exception" | |
/> | |
</cfif> | |
<!--- Set up the array to return. ---> | |
<cfset LOCAL.Values = ArrayNew( 1 ) /> | |
<!--- | |
Loop over the value count to populate the return | |
value array. Even if the user has requested only one | |
value, we are going to build an array for multiple | |
values. This will reduce our overall logic. | |
---> | |
<cfloop | |
index="LOCAL.ValueIndex" | |
from="1" | |
to="#ARGUMENTS.Count#" | |
step="1"> | |
<!--- Increment the global index. ---> | |
<cfset VARIABLES.Instance.Index = (VARIABLES.Instance.Index + 1) /> | |
<!--- Add this value to the return value array. ---> | |
<cfset ArrayAppend( | |
LOCAL.Values, | |
VARIABLES.Instance.Data[ VARIABLES.Instance.Index ] | |
) /> | |
</cfloop> | |
<!--- | |
When returning the values, we do not want to return | |
the array if only a single element was requested. | |
Check to see what the original count request was. | |
---> | |
<cfif (ARGUMENTS.Count EQ 1)> | |
<!--- | |
Only one value was requested, so just return | |
the first value of our values array. | |
---> | |
<cfreturn LOCAL.Values[ 1 ] /> | |
<cfelse> | |
<!--- Return the entire value array. ---> | |
<cfreturn LOCAL.Values /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="Size" | |
access="public" | |
returntype="numeric" | |
output="false" | |
hint="Returns the number of random value remaining."> | |
<!--- | |
Since our global index is keeping track of the | |
last value index that was returned, we just need | |
to find the difference between this index and | |
the size of the data array. | |
---> | |
<cfreturn (ArrayLen( VARIABLES.Instance.Data ) - VARIABLES.Instance.Index)/> | |
</cffunction> | |
</cfcomponent> |
This methods does a lot of work up front to shuffle the array, but then, it requires no calculations for subsequent value selections.
Select And Track
This algorithm works by keeping an index of selected index values. For every selection that gets made, it chooses a random index and checks to see if it has already been selected (and if so, it will continue to select random indexes until an unused one is found):
<cfcomponent | |
hint="Returns random elements from an array."> | |
<!--- | |
Run the pseudo constructor to set up default | |
values and data structures. | |
---> | |
<!--- | |
Set up an instance strucure to hold our | |
instance-related data. | |
---> | |
<cfset VARIABLES.Instance = StructNew() /> | |
<!--- | |
Create a default data array. This will be | |
overridden by the constructor. | |
---> | |
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) /> | |
<!--- | |
This will keep track of which indexes we have | |
already used. This Struct will act as a look | |
up when getting new index values. | |
---> | |
<cfset VARIABLES.Instance.IndexLookUp = StructNew() /> | |
<cffunction | |
name="Init" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns an initialized component with the given data."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Data" | |
type="array" | |
required="true" | |
hint="The array from which we will be getting random data." | |
/> | |
<!--- Store the passed in array. ---> | |
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data /> | |
<!--- Return This reference. ---> | |
<cfreturn THIS /> | |
</cffunction> | |
<cffunction | |
name="Get" | |
access="public" | |
returntype="any" | |
output="false" | |
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned)."> | |
<!--- Define arguments. ---> | |
<cfargument | |
name="Count" | |
type="numeric" | |
required="false" | |
default="1" | |
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned." | |
/> | |
<!--- Set up local scope. ---> | |
<cfset var LOCAL = StructNew() /> | |
<!--- | |
Check to see if we have enough values to | |
return a full value array. | |
---> | |
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (ARGUMENTS.Count + StructCount( VARIABLES.Instance.IndexLookUp )))> | |
<!--- Throw an error. ---> | |
<cfthrow | |
message="You have requested more values than are available in the original array." | |
type="OutOfBounds.Exception" | |
/> | |
</cfif> | |
<!--- Set up the array to return. ---> | |
<cfset LOCAL.Values = ArrayNew( 1 ) /> | |
<!--- | |
Loop over the value count to populate the return | |
value array. Even if the user has requested only one | |
value, we are going to build an array for multiple | |
values. This will reduce our overall logic. | |
---> | |
<cfloop | |
index="LOCAL.ValueIndex" | |
from="1" | |
to="#ARGUMENTS.Count#" | |
step="1"> | |
<!--- | |
Keep looping while that index has not yet been | |
chosen. We will know if this index has been used | |
if it is in the index loop up. The loop will be | |
broken manually once an unused index is found. | |
---> | |
<cfloop condition="true"> | |
<!--- Pick a random index out of the array. ---> | |
<cfset LOCAL.Index = RandRange( | |
1, | |
ArrayLen( VARIABLES.Instance.Data ) | |
) /> | |
<!--- | |
Check to see if that index has already been | |
used by a previous selection. | |
---> | |
<cfif NOT StructKeyExists( | |
VARIABLES.Instance.IndexLookUp, | |
LOCAL.Index | |
)> | |
<!--- | |
This is new. Add it to the index look | |
up structure. | |
---> | |
<cfset VARIABLES.Instance.IndexLookUp[ LOCAL.Index ] = true /> | |
<!--- Break out of this loop. ---> | |
<cfbreak /> | |
</cfif> | |
</cfloop> | |
<!--- Add this value to the return value array. ---> | |
<cfset ArrayAppend( | |
LOCAL.Values, | |
VARIABLES.Instance.Data[ LOCAL.Index ] | |
) /> | |
</cfloop> | |
<!--- | |
When returning the values, we do not want to return | |
the array if only a single element was requested. | |
Check to see what the original count request was. | |
---> | |
<cfif (ARGUMENTS.Count EQ 1)> | |
<!--- | |
Only one value was requested, so just return | |
the first value of our values array. | |
---> | |
<cfreturn LOCAL.Values[ 1 ] /> | |
<cfelse> | |
<!--- Return the entire value array. ---> | |
<cfreturn LOCAL.Values /> | |
</cfif> | |
</cffunction> | |
<cffunction | |
name="Size" | |
access="public" | |
returntype="numeric" | |
output="false" | |
hint="Returns the number of random value remaining."> | |
<!--- | |
Since our index look up struct is keeping | |
track of all the indexes already uses, then | |
we can figure out how many we have left by | |
subtracting the size of the look up from the | |
original data array. | |
---> | |
<cfreturn ( | |
ArrayLen( VARIABLES.Instance.Data ) - | |
StructCount( VARIABLES.Instance.IndexLookUp ) | |
) /> | |
</cffunction> | |
</cfcomponent> |
This algorithm does no work up front, but it does increasingly more work (potentially) for every subsequent value selection. The reason we are doing this is to see if the ArrayDelete() in the first algorithm comes at a heavier cost than the repeated random selection and index look ups.
Ok, so now that we have our algorithms down, it's time to test them. Before we do anything at all, we have to test to make sure that our algorithms work without regard to efficientcy. Therefore, we must run some very small use cases:
<!--- | |
First we are going to start off with a very small | |
array. This is going to be our proof of concept | |
array to make sure that all of our algorithms work | |
(Before we test for efficiency). | |
---> | |
<cfset arrData = ListToArray( | |
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20" | |
) /> | |
<!--- | |
Loop over the three types of array randomizers to | |
see if they all produce random values. | |
---> | |
<cfloop | |
index="intType" | |
from="1" | |
to="3" | |
step="1"> | |
<!--- Create the array randomizer of this type. ---> | |
<cfset objRandArray = CreateObject( | |
"component", | |
"ArrayRandomizer#intType#" | |
).Init( | |
arrData | |
) /> | |
<h2> | |
Array Randomizer #intType# | |
</h2> | |
<!--- | |
Keep looping until we have gotten every unique | |
but random index/value out of the array. | |
---> | |
<cfloop | |
condition="objRandArray.Size() GT 5"> | |
[ #objRandArray.Get()# ] | |
</cfloop> | |
<!--- | |
Demonstrate that each array randomizer can | |
also return an array or random values. | |
---> | |
<cfdump var="#objRandArray.Get( 5 )#" /> | |
</cfloop> |
In the previous test, we were just making sure that there were no blatant errors. We were also checking the Get() method to see if it could handle single and multiple value selects (not required in the quiz, but a cool feature nonetheless). These worked fine (and I will spare you the output as this is a rather long post).
Now that we know these algorithms work in so much as they do what they were defined to do, we can now test to see how they perform. First, we will test "worst case scenarios" in which we have to select N random values from an array of length N. And, since ColdFusion is sooo awesome, we have to use a fairly large array to even see time differences:
<!--- | |
At this point we have proved that our array randomizers | |
do indeed work for N values (where N was an arbitrary | |
20). Now that we know that the algorithms work, we can | |
start to test for efficiency. There are two things we | |
can test for: | |
1. Worst case scenario - what happens if we need to get | |
ALL N random index values. | |
2. Better case scendario - what happens if we need to | |
get only a small subset of all the random index values. | |
---> | |
<!--- | |
To help test the worst case scenario, let's create a | |
very huge array. This array will have values that | |
repeat. Since we know that our algorithms work properly, | |
we don't need to check the values... we just need to | |
see what performs faster. | |
This will create an array of length: 5,000. | |
---> | |
<cfset arrData = ListToArray( | |
RepeatString( | |
"0,1,2,3,4,5,6,7,8,9,", | |
500 | |
) | |
) /> | |
<!--- | |
Loop over the three types of array randomizers see | |
how fast each of them can get all the random values. | |
---> | |
<cfloop | |
index="intType" | |
from="1" | |
to="3" | |
step="1"> | |
<!--- Create the array randomizer of this type. ---> | |
<cfset objRandArray = CreateObject( | |
"component", | |
"ArrayRandomizer#intType#" | |
).Init( | |
arrData | |
) /> | |
<h2> | |
Array Randomizer #intType# | |
</h2> | |
<cftimer | |
label="Array Randomize #intType#" type="outline"> | |
<!--- | |
Keep looping until we have gotten every unique | |
but random index/value out of the array. We are | |
going to wrap this in CFSilent tags since we | |
don't want to waste our white space. | |
---> | |
<cfsilent> | |
<cfloop | |
condition="objRandArray.Size()"> | |
<!--- | |
Just get the value, but don't do | |
anything with it. | |
---> | |
<cfset strValue = objRandArray.Get() /> | |
</cfloop> | |
</cfsilent> | |
DONE. | |
</cftimer> | |
</cfloop> |
Running the above tests, we started to see a clear trend in seconds to execute:
ArrayRandomizer1 (Select and Delete)
10.8, 10.7, 10.8, 11.5, 10.8
Average: 10.9
ArrayRandomizer2 (Shuffle and Select)
11.7, 10.8, 10.8, 10.8, 10.8
Average: 11.0
ArrayRandomizer3 (Select and Track)
13.0, 12.8, 13.5, 12.9, 13.5
Average: 13.1
I have to say that I was blown away by the performance of all three of these algorithms. The first two trended to be slightly better, but all three were lighting fast. I was also blown away that the up front Shuffle() method was just as fast as first method that merely deleted selected values. I had no idea that java.util.Collections Shuffle() method was so freakin' efficient.
In all reality, though, we rarely ever have to deal with worst case scenarios. So, for the final test, I wanted to test a "better case scenario" in which we were selecting multiple values from a massive array, but no where near the number of possible values:
<!--- | |
Now, we will create an even bigger array but only get | |
a smaller subset of the index values. We are going | |
this because ArrayRandomizer2 does more work up-front | |
where as the other do more work as you go. | |
This will create an array of length: 10,000. | |
---> | |
<cfset arrData = ListToArray( | |
RepeatString( | |
"0,1,2,3,4,5,6,7,8,9,", | |
1000 | |
) | |
) /> | |
<!--- | |
Loop over the three types of array randomizers see | |
how fast each of them can get all the random values. | |
---> | |
<cfloop | |
index="intType" | |
from="1" | |
to="3" | |
step="1"> | |
<!--- Create the array randomizer of this type. ---> | |
<cfset objRandArray = CreateObject( | |
"component", | |
"ArrayRandomizer#intType#" | |
).Init( | |
arrData | |
) /> | |
<h2> | |
Array Randomizer #intType# | |
</h2> | |
<cftimer | |
label="Array Randomize #intType#" type="outline"> | |
<!--- | |
Only get the first 10 values out of the array. | |
We want to test the better case speed. | |
---> | |
<cfloop | |
index="intCounter" | |
from="1" | |
to="10" | |
step="1"> | |
[ #objRandArray.Get()# ] | |
</cfloop> | |
DONE. | |
</cftimer> | |
</cfloop> |
Running the above test, we started to see a clear trend in milliseconds to execute:
ArrayRandomizer1 (Select and Delete)
15, 16, 31, 15, 00
Average: 15.4
ArrayRandomizer2 (Shuffle and Select)
00, 31, 16, 16, 15
Average: 15.6
ArrayRandomizer3 (Select and Track)
16, 15, 00, 00, 00
Average: 6.2
Again, all three are lighting fast, but it seems for a better/best case scenario, the Select and Track method was the best. That makes sense as the third algorithm does no up front work and does very little work early in the value selection process. I was again blown away at how fast an array of 10,000 elements can be shuffled (algorithm 2).
So in conclusion, it makes little to no difference which algorithm you use. I like the Shuffle() method, but I think that is more of a cool factor than anything else.
Want to use code from this post? Check out the license.
Reader Comments
Really enjoyed this post. got me thinking!
Thanks Joe, I am glad to hear it.
Hi Ben--sorry about providing yours as a zip, but it seemed like the expeditious thing to do at 2 a.m. :-) I hope that doesn't give you a disadvantage.
@Matt,
No worries. I was joking about the disadvantage - I actually put that in during the proof read, not in the original go at the blog entry :) It's just a fun contest. Coming up with a solution is more fun that winning. Plus, look at this blog entry.... how could have possible put that up on your site :)
Ben opinions coincide for us :)
who did have such question interestingly? to decide him by such method! :)