# CFQuiz 2.13: Selecting N Random Elements From An Array In ColdFusion

By on
Tags:

In the ColdFusion Weekly Podcast, version 2.13, the ColdFusion quiz involved picking N (or rather less than N) elements from an array in a random order. Here is the CFQuiz description:

Assume you have an array with n number of elements, and you want to retrieve a random number of elements from this array. The random number you want to retrieve is less than n, but greater than 1, and you cannot retrieve the same element twice. So for example, let's say your array has 10 elements in it and you want to retrieve 5 random elements, but none of these 5 can duplicate one another. How would you do that?

I thought this was an excellent quiz choice and I want to discuss my answer(s) here as they are not as easily viewed on the ColdFusion weekly web site (does that give me a disadvantage?).

When it comes to choosing random elements from an array, I can think of three different algorithms:

1. Select and Delete

2. Shuffle and Select

3. Select and Track

My gut feeling tells me that each of these methods (explored in more detail below) have use cases in which they are more or less efficient. The thing I wanted to test was the efficiency of each algorithm as the number of desired elements gets closer to N (the total number of elements in the array).

To help test these different algorithms, I wrapped each in a ColdFusion component and gave it a common interface:

#### Init( Array )

This method accepts the target array. Remember that arrays are passed by VALUE in ColdFusion so each ColdFusion component instance will get its very own copy of the array (important for destructive algorithms).

#### Get( Count )

This method returns one or more random elements from the array. If only one element is requested (optional, default argument value), then only a single value is returned. If more than one element is requested, then those elements are put in an array and that array of values is returned. If more elements are requested than there are available in the target array, an error is thrown.

#### Size()

This method returns the number of elements in the target array that have not yet been selected.

Before we get into the testing of efficiency, let's review the three different random select algorithms.

### Select And Delete

This algorithm works by choosing a random element from the array. That index is then deleted from the array and the removed value is returned. By doing this, we can ensure that no two index-values are chosen as they are removed from the array after first use:

``````<cfcomponent
hint="Returns random elements from an array.">

<!---
Run the pseudo constructor to set up default
values and data structures.
--->

<!---
Set up an instance strucure to hold our
instance-related data.
--->
<cfset VARIABLES.Instance = StructNew() />

<!---
Create a default data array. This will be
overridden by the constructor.
--->
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />

<cffunction
name="Init"
access="public"
returntype="any"
output="false"
hint="Returns an initialized component with the given data.">

<!--- Define arguments. --->
<cfargument
name="Data"
type="array"
required="true"
hint="The array from which we will be getting random data."
/>

<!--- Store the passed in array. --->
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

<!--- Return This reference. --->
<cfreturn THIS />
</cffunction>

<cffunction
name="Get"
access="public"
returntype="any"
output="false"
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

<!--- Define arguments. --->
<cfargument
name="Count"
type="numeric"
required="false"
default="1"
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
/>

<!--- Set up local scope. --->
<cfset var LOCAL = StructNew() />

<!---
Check to see if we have enough values to
return a full value array.
--->
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT ARGUMENTS.Count)>

<!--- Throw an error. --->
<cfthrow
message="You have requested more values than are available in the original array."
type="OutOfBounds.Exception"
/>

</cfif>

<!--- Set up the array to return. --->
<cfset LOCAL.Values = ArrayNew( 1 ) />

<!---
Loop over the value count to populate the return
value array. Even if the user has requested only one
value, we are going to build an array for multiple
values. This will reduce our overall logic.
--->
<cfloop
index="LOCAL.ValueIndex"
from="1"
to="#ARGUMENTS.Count#"
step="1">

<!--- Pick a random index out of the array. --->
<cfset LOCAL.Index = RandRange(
1,
ArrayLen( VARIABLES.Instance.Data )
) />

<!--- Add this value to the return value array. --->
<cfset ArrayAppend(
LOCAL.Values,
VARIABLES.Instance.Data[ LOCAL.Index ]
) />

<!---
Delete this index from the original array so
that we can never pick this value again.
--->
<cfset ArrayDeleteAt(
VARIABLES.Instance.Data,
LOCAL.Index
) />

</cfloop>

<!---
When returning the values, we do not want to return
the array if only a single element was requested.
Check to see what the original count request was.
--->
<cfif (ARGUMENTS.Count EQ 1)>

<!---
Only one value was requested, so just return
the first value of our values array.
--->
<cfreturn LOCAL.Values[ 1 ] />

<cfelse>

<!--- Return the entire value array. --->
<cfreturn LOCAL.Values />

</cfif>

</cffunction>

<cffunction
name="Size"
access="public"
returntype="numeric"
output="false"
hint="Returns the number of random value remaining.">

<!---
Since we are deleting array entires as we use
them, the number of values left is equivalent
to the size of our internal data array.
--->
<cfreturn ArrayLen( VARIABLES.Instance.Data ) />
</cffunction>

</cfcomponent>
``````

This method does not do any work up-front, but it does require calculations for every single value selection.

### Shuffle And Select

This algorithm works by shuffling the array the second the ColdFusion component is initialized. Once the array is shuffled, the Get() method merely iterates over the shuffled array and returns values in order (in shuffled order that is):

``````<cfcomponent
hint="Returns random elements from an array.">

<!---
Run the pseudo constructor to set up default
values and data structures.
--->

<!---
Set up an instance strucure to hold our
instance-related data.
--->
<cfset VARIABLES.Instance = StructNew() />

<!---
Create a default data array. This will be
overridden by the constructor.
--->
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />

<!---
This will keep track of which index we last
--->
<cfset VARIABLES.Instance.Index = 0 />

<cffunction
name="Init"
access="public"
returntype="any"
output="false"
hint="Returns an initialized component with the given data.">

<!--- Define arguments. --->
<cfargument
name="Data"
type="array"
required="true"
hint="The array from which we will be getting random data."
/>

<!--- Store the passed in array. --->
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

<!---
Shuffle the data so that we start off with
a completely randomized order.
--->
<cfset CreateObject( "java", "java.util.Collections" ).Shuffle(
VARIABLES.Instance.Data
) />

<!--- Return This reference. --->
<cfreturn THIS />
</cffunction>

<cffunction
name="Get"
access="public"
returntype="any"
output="false"
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

<!--- Define arguments. --->
<cfargument
name="Count"
type="numeric"
required="false"
default="1"
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
/>

<!--- Set up local scope. --->
<cfset var LOCAL = StructNew() />

<!---
Check to see if we have enough values to
return a full value array.
--->
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (VARIABLES.Instance.Index + ARGUMENTS.Count))>

<!--- Throw an error. --->
<cfthrow
message="You have requested more values than are available in the original array."
type="OutOfBounds.Exception"
/>

</cfif>

<!--- Set up the array to return. --->
<cfset LOCAL.Values = ArrayNew( 1 ) />

<!---
Loop over the value count to populate the return
value array. Even if the user has requested only one
value, we are going to build an array for multiple
values. This will reduce our overall logic.
--->
<cfloop
index="LOCAL.ValueIndex"
from="1"
to="#ARGUMENTS.Count#"
step="1">

<!--- Increment the global index. --->
<cfset VARIABLES.Instance.Index = (VARIABLES.Instance.Index + 1) />

<!--- Add this value to the return value array. --->
<cfset ArrayAppend(
LOCAL.Values,
VARIABLES.Instance.Data[ VARIABLES.Instance.Index ]
) />

</cfloop>

<!---
When returning the values, we do not want to return
the array if only a single element was requested.
Check to see what the original count request was.
--->
<cfif (ARGUMENTS.Count EQ 1)>

<!---
Only one value was requested, so just return
the first value of our values array.
--->
<cfreturn LOCAL.Values[ 1 ] />

<cfelse>

<!--- Return the entire value array. --->
<cfreturn LOCAL.Values />

</cfif>

</cffunction>

<cffunction
name="Size"
access="public"
returntype="numeric"
output="false"
hint="Returns the number of random value remaining.">

<!---
Since our global index is keeping track of the
last value index that was returned, we just need
to find the difference between this index and
the size of the data array.
--->
<cfreturn (ArrayLen( VARIABLES.Instance.Data ) - VARIABLES.Instance.Index)/>
</cffunction>

</cfcomponent>
``````

This methods does a lot of work up front to shuffle the array, but then, it requires no calculations for subsequent value selections.

### Select And Track

This algorithm works by keeping an index of selected index values. For every selection that gets made, it chooses a random index and checks to see if it has already been selected (and if so, it will continue to select random indexes until an unused one is found):

``````<cfcomponent
hint="Returns random elements from an array.">

<!---
Run the pseudo constructor to set up default
values and data structures.
--->

<!---
Set up an instance strucure to hold our
instance-related data.
--->
<cfset VARIABLES.Instance = StructNew() />

<!---
Create a default data array. This will be
overridden by the constructor.
--->
<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />

<!---
This will keep track of which indexes we have
already used. This Struct will act as a look
up when getting new index values.
--->
<cfset VARIABLES.Instance.IndexLookUp = StructNew() />

<cffunction
name="Init"
access="public"
returntype="any"
output="false"
hint="Returns an initialized component with the given data.">

<!--- Define arguments. --->
<cfargument
name="Data"
type="array"
required="true"
hint="The array from which we will be getting random data."
/>

<!--- Store the passed in array. --->
<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

<!--- Return This reference. --->
<cfreturn THIS />
</cffunction>

<cffunction
name="Get"
access="public"
returntype="any"
output="false"
hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

<!--- Define arguments. --->
<cfargument
name="Count"
type="numeric"
required="false"
default="1"
hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
/>

<!--- Set up local scope. --->
<cfset var LOCAL = StructNew() />

<!---
Check to see if we have enough values to
return a full value array.
--->
<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (ARGUMENTS.Count + StructCount( VARIABLES.Instance.IndexLookUp )))>

<!--- Throw an error. --->
<cfthrow
message="You have requested more values than are available in the original array."
type="OutOfBounds.Exception"
/>

</cfif>

<!--- Set up the array to return. --->
<cfset LOCAL.Values = ArrayNew( 1 ) />

<!---
Loop over the value count to populate the return
value array. Even if the user has requested only one
value, we are going to build an array for multiple
values. This will reduce our overall logic.
--->
<cfloop
index="LOCAL.ValueIndex"
from="1"
to="#ARGUMENTS.Count#"
step="1">

<!---
Keep looping while that index has not yet been
chosen. We will know if this index has been used
if it is in the index loop up. The loop will be
broken manually once an unused index is found.
--->
<cfloop condition="true">

<!--- Pick a random index out of the array. --->
<cfset LOCAL.Index = RandRange(
1,
ArrayLen( VARIABLES.Instance.Data )
) />

<!---
Check to see if that index has already been
used by a previous selection.
--->
<cfif NOT StructKeyExists(
VARIABLES.Instance.IndexLookUp,
LOCAL.Index
)>

<!---
This is new. Add it to the index look
up structure.
--->
<cfset VARIABLES.Instance.IndexLookUp[ LOCAL.Index ] = true />

<!--- Break out of this loop. --->
<cfbreak />

</cfif>

</cfloop>

<!--- Add this value to the return value array. --->
<cfset ArrayAppend(
LOCAL.Values,
VARIABLES.Instance.Data[ LOCAL.Index ]
) />

</cfloop>

<!---
When returning the values, we do not want to return
the array if only a single element was requested.
Check to see what the original count request was.
--->
<cfif (ARGUMENTS.Count EQ 1)>

<!---
Only one value was requested, so just return
the first value of our values array.
--->
<cfreturn LOCAL.Values[ 1 ] />

<cfelse>

<!--- Return the entire value array. --->
<cfreturn LOCAL.Values />

</cfif>

</cffunction>

<cffunction
name="Size"
access="public"
returntype="numeric"
output="false"
hint="Returns the number of random value remaining.">

<!---
Since our index look up struct is keeping
track of all the indexes already uses, then
we can figure out how many we have left by
subtracting the size of the look up from the
original data array.
--->
<cfreturn (
ArrayLen( VARIABLES.Instance.Data ) -
StructCount( VARIABLES.Instance.IndexLookUp )
) />
</cffunction>

</cfcomponent>
``````

This algorithm does no work up front, but it does increasingly more work (potentially) for every subsequent value selection. The reason we are doing this is to see if the ArrayDelete() in the first algorithm comes at a heavier cost than the repeated random selection and index look ups.

Ok, so now that we have our algorithms down, it's time to test them. Before we do anything at all, we have to test to make sure that our algorithms work without regard to efficientcy. Therefore, we must run some very small use cases:

``````<!---
First we are going to start off with a very small
array. This is going to be our proof of concept
array to make sure that all of our algorithms work
(Before we test for efficiency).
--->
<cfset arrData = ListToArray(
"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
) />

<!---
Loop over the three types of array randomizers to
see if they all produce random values.
--->
<cfloop
index="intType"
from="1"
to="3"
step="1">

<!--- Create the array randomizer of this type. --->
<cfset objRandArray = CreateObject(
"component",
"ArrayRandomizer#intType#"
).Init(
arrData
) />

<h2>
Array Randomizer #intType#
</h2>

<!---
Keep looping until we have gotten every unique
but random index/value out of the array.
--->
<cfloop
condition="objRandArray.Size() GT 5">

[ #objRandArray.Get()# ]

</cfloop>

<!---
Demonstrate that each array randomizer can
also return an array or random values.
--->
<cfdump var="#objRandArray.Get( 5 )#" />

</cfloop>
``````

In the previous test, we were just making sure that there were no blatant errors. We were also checking the Get() method to see if it could handle single and multiple value selects (not required in the quiz, but a cool feature nonetheless). These worked fine (and I will spare you the output as this is a rather long post).

Now that we know these algorithms work in so much as they do what they were defined to do, we can now test to see how they perform. First, we will test "worst case scenarios" in which we have to select N random values from an array of length N. And, since ColdFusion is sooo awesome, we have to use a fairly large array to even see time differences:

``````<!---
At this point we have proved that our array randomizers
do indeed work for N values (where N was an arbitrary
20). Now that we know that the algorithms work, we can
start to test for efficiency. There are two things we
can test for:

1. Worst case scenario - what happens if we need to get
ALL N random index values.

2. Better case scendario - what happens if we need to
get only a small subset of all the random index values.
--->

<!---
To help test the worst case scenario, let's create a
very huge array. This array will have values that
repeat. Since we know that our algorithms work properly,
we don't need to check the values... we just need to
see what performs faster.

This will create an array of length: 5,000.
--->
<cfset arrData = ListToArray(
RepeatString(
"0,1,2,3,4,5,6,7,8,9,",
500
)
) />

<!---
Loop over the three types of array randomizers see
how fast each of them can get all the random values.
--->
<cfloop
index="intType"
from="1"
to="3"
step="1">

<!--- Create the array randomizer of this type. --->
<cfset objRandArray = CreateObject(
"component",
"ArrayRandomizer#intType#"
).Init(
arrData
) />

<h2>
Array Randomizer #intType#
</h2>

<cftimer
label="Array Randomize #intType#" type="outline">

<!---
Keep looping until we have gotten every unique
but random index/value out of the array. We are
going to wrap this in CFSilent tags since we
don't want to waste our white space.
--->
<cfsilent>

<cfloop
condition="objRandArray.Size()">

<!---
Just get the value, but don't do
anything with it.
--->
<cfset strValue = objRandArray.Get() />

</cfloop>

</cfsilent>

DONE.

</cftimer>

</cfloop>
``````

Running the above tests, we started to see a clear trend in seconds to execute:

#### ArrayRandomizer1 (Select and Delete)

10.8, 10.7, 10.8, 11.5, 10.8
Average: 10.9

#### ArrayRandomizer2 (Shuffle and Select)

11.7, 10.8, 10.8, 10.8, 10.8
Average: 11.0

#### ArrayRandomizer3 (Select and Track)

13.0, 12.8, 13.5, 12.9, 13.5
Average: 13.1

I have to say that I was blown away by the performance of all three of these algorithms. The first two trended to be slightly better, but all three were lighting fast. I was also blown away that the up front Shuffle() method was just as fast as first method that merely deleted selected values. I had no idea that java.util.Collections Shuffle() method was so freakin' efficient.

In all reality, though, we rarely ever have to deal with worst case scenarios. So, for the final test, I wanted to test a "better case scenario" in which we were selecting multiple values from a massive array, but no where near the number of possible values:

``````<!---
Now, we will create an even bigger array but only get
a smaller subset of the index values. We are going
this because ArrayRandomizer2 does more work up-front
where as the other do more work as you go.

This will create an array of length: 10,000.
--->
<cfset arrData = ListToArray(
RepeatString(
"0,1,2,3,4,5,6,7,8,9,",
1000
)
) />

<!---
Loop over the three types of array randomizers see
how fast each of them can get all the random values.
--->
<cfloop
index="intType"
from="1"
to="3"
step="1">

<!--- Create the array randomizer of this type. --->
<cfset objRandArray = CreateObject(
"component",
"ArrayRandomizer#intType#"
).Init(
arrData
) />

<h2>
Array Randomizer #intType#
</h2>

<cftimer
label="Array Randomize #intType#" type="outline">

<!---
Only get the first 10 values out of the array.
We want to test the better case speed.
--->
<cfloop
index="intCounter"
from="1"
to="10"
step="1">

[ #objRandArray.Get()# ]

</cfloop>

DONE.

</cftimer>

</cfloop>
``````

Running the above test, we started to see a clear trend in milliseconds to execute:

#### ArrayRandomizer1 (Select and Delete)

15, 16, 31, 15, 00
Average: 15.4

#### ArrayRandomizer2 (Shuffle and Select)

00, 31, 16, 16, 15
Average: 15.6

#### ArrayRandomizer3 (Select and Track)

16, 15, 00, 00, 00
Average: 6.2

Again, all three are lighting fast, but it seems for a better/best case scenario, the Select and Track method was the best. That makes sense as the third algorithm does no up front work and does very little work early in the value selection process. I was again blown away at how fast an array of 10,000 elements can be shuffled (algorithm 2).

So in conclusion, it makes little to no difference which algorithm you use. I like the Shuffle() method, but I think that is more of a cool factor than anything else.

Want to use code from this post? Check out the license.