CFQuiz 2.13: Selecting N Random Elements From An Array In ColdFusion

By Ben Nadel

Published 2007-05-23 in ColdFusion — Comments (6)

In the ColdFusion Weekly Podcast, version 2.13, the ColdFusion quiz involved picking N (or rather less than N) elements from an array in a random order. Here is the CFQuiz description:

Assume you have an array with n number of elements, and you want to retrieve a random number of elements from this array. The random number you want to retrieve is less than n, but greater than 1, and you cannot retrieve the same element twice. So for example, let's say your array has 10 elements in it and you want to retrieve 5 random elements, but none of these 5 can duplicate one another. How would you do that?

I thought this was an excellent quiz choice and I want to discuss my answer(s) here as they are not as easily viewed on the ColdFusion weekly web site (does that give me a disadvantage?).

When it comes to choosing random elements from an array, I can think of three different algorithms:

Select and Delete
Shuffle and Select
Select and Track

My gut feeling tells me that each of these methods (explored in more detail below) have use cases in which they are more or less efficient. The thing I wanted to test was the efficiency of each algorithm as the number of desired elements gets closer to N (the total number of elements in the array).

To help test these different algorithms, I wrapped each in a ColdFusion component and gave it a common interface:

Init( Array )

This method accepts the target array. Remember that arrays are passed by VALUE in ColdFusion so each ColdFusion component instance will get its very own copy of the array (important for destructive algorithms).

Get( Count )

This method returns one or more random elements from the array. If only one element is requested (optional, default argument value), then only a single value is returned. If more than one element is requested, then those elements are put in an array and that array of values is returned. If more elements are requested than there are available in the target array, an error is thrown.

Size()

This method returns the number of elements in the target array that have not yet been selected.

Before we get into the testing of efficiency, let's review the three different random select algorithms.

Select And Delete

This algorithm works by choosing a random element from the array. That index is then deleted from the array and the removed value is returned. By doing this, we can ensure that no two index-values are chosen as they are removed from the array after first use:

<cfcomponent
	hint="Returns random elements from an array.">

	<!---
		Run the pseudo constructor to set up default
		values and data structures.
	--->

	<!---
		Set up an instance strucure to hold our
		instance-related data.
	--->
	<cfset VARIABLES.Instance = StructNew() />

	<!---
		Create a default data array. This will be
		overridden by the constructor.
	--->
	<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />


	<cffunction
		name="Init"
		access="public"
		returntype="any"
		output="false"
		hint="Returns an initialized component with the given data.">

		<!--- Define arguments. --->
		<cfargument
			name="Data"
			type="array"
			required="true"
			hint="The array from which we will be getting random data."
			/>


		<!--- Store the passed in array. --->
		<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

		<!--- Return This reference. --->
		<cfreturn THIS />
	</cffunction>


	<cffunction
		name="Get"
		access="public"
		returntype="any"
		output="false"
		hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

		<!--- Define arguments. --->
		<cfargument
			name="Count"
			type="numeric"
			required="false"
			default="1"
			hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
			/>

		<!--- Set up local scope. --->
		<cfset var LOCAL = StructNew() />


		<!---
			Check to see if we have enough values to
			return a full value array.
		--->
		<cfif (ArrayLen( VARIABLES.Instance.Data ) LT ARGUMENTS.Count)>

			<!--- Throw an error. --->
			<cfthrow
				message="You have requested more values than are available in the original array."
				type="OutOfBounds.Exception"
				/>

		</cfif>


		<!--- Set up the array to return. --->
		<cfset LOCAL.Values = ArrayNew( 1 ) />

		<!---
			Loop over the value count to populate the return
			value array. Even if the user has requested only one
			value, we are going to build an array for multiple
			values. This will reduce our overall logic.
		--->
		<cfloop
			index="LOCAL.ValueIndex"
			from="1"
			to="#ARGUMENTS.Count#"
			step="1">

			<!--- Pick a random index out of the array. --->
			<cfset LOCAL.Index = RandRange(
				1,
				ArrayLen( VARIABLES.Instance.Data )
				) />

			<!--- Add this value to the return value array. --->
			<cfset ArrayAppend(
				LOCAL.Values,
				VARIABLES.Instance.Data[ LOCAL.Index ]
				) />

			<!---
				Delete this index from the original array so
				that we can never pick this value again.
			--->
			<cfset ArrayDeleteAt(
				VARIABLES.Instance.Data,
				LOCAL.Index
				) />

		</cfloop>


		<!---
			When returning the values, we do not want to return
			the array if only a single element was requested.
			Check to see what the original count request was.
		--->
		<cfif (ARGUMENTS.Count EQ 1)>

			<!---
				Only one value was requested, so just return
				the first value of our values array.
			--->
			<cfreturn LOCAL.Values[ 1 ] />

		<cfelse>

			<!--- Return the entire value array. --->
			<cfreturn LOCAL.Values />

		</cfif>

	</cffunction>


	<cffunction
		name="Size"
		access="public"
		returntype="numeric"
		output="false"
		hint="Returns the number of random value remaining.">

		<!---
			Since we are deleting array entires as we use
			them, the number of values left is equivalent
			to the size of our internal data array.
		--->
		<cfreturn ArrayLen( VARIABLES.Instance.Data ) />
	</cffunction>

</cfcomponent>

This method does not do any work up-front, but it does require calculations for every single value selection.

Shuffle And Select

This algorithm works by shuffling the array the second the ColdFusion component is initialized. Once the array is shuffled, the Get() method merely iterates over the shuffled array and returns values in order (in shuffled order that is):

<cfcomponent
	hint="Returns random elements from an array.">

	<!---
		Run the pseudo constructor to set up default
		values and data structures.
	--->

	<!---
		Set up an instance strucure to hold our
		instance-related data.
	--->
	<cfset VARIABLES.Instance = StructNew() />

	<!---
		Create a default data array. This will be
		overridden by the constructor.
	--->
	<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />

	<!---
		This will keep track of which index we last
		retrieved from the data array.
	--->
	<cfset VARIABLES.Instance.Index = 0 />


	<cffunction
		name="Init"
		access="public"
		returntype="any"
		output="false"
		hint="Returns an initialized component with the given data.">

		<!--- Define arguments. --->
		<cfargument
			name="Data"
			type="array"
			required="true"
			hint="The array from which we will be getting random data."
			/>


		<!--- Store the passed in array. --->
		<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

		<!---
			Shuffle the data so that we start off with
			a completely randomized order.
		--->
		<cfset CreateObject( "java", "java.util.Collections" ).Shuffle(
			VARIABLES.Instance.Data
			) />

		<!--- Return This reference. --->
		<cfreturn THIS />
	</cffunction>


	<cffunction
		name="Get"
		access="public"
		returntype="any"
		output="false"
		hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

		<!--- Define arguments. --->
		<cfargument
			name="Count"
			type="numeric"
			required="false"
			default="1"
			hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
			/>

		<!--- Set up local scope. --->
		<cfset var LOCAL = StructNew() />


		<!---
			Check to see if we have enough values to
			return a full value array.
		--->
		<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (VARIABLES.Instance.Index + ARGUMENTS.Count))>

			<!--- Throw an error. --->
			<cfthrow
				message="You have requested more values than are available in the original array."
				type="OutOfBounds.Exception"
				/>

		</cfif>


		<!--- Set up the array to return. --->
		<cfset LOCAL.Values = ArrayNew( 1 ) />

		<!---
			Loop over the value count to populate the return
			value array. Even if the user has requested only one
			value, we are going to build an array for multiple
			values. This will reduce our overall logic.
		--->
		<cfloop
			index="LOCAL.ValueIndex"
			from="1"
			to="#ARGUMENTS.Count#"
			step="1">

			<!--- Increment the global index. --->
			<cfset VARIABLES.Instance.Index = (VARIABLES.Instance.Index + 1) />

			<!--- Add this value to the return value array. --->
			<cfset ArrayAppend(
				LOCAL.Values,
				VARIABLES.Instance.Data[ VARIABLES.Instance.Index ]
				) />

		</cfloop>


		<!---
			When returning the values, we do not want to return
			the array if only a single element was requested.
			Check to see what the original count request was.
		--->
		<cfif (ARGUMENTS.Count EQ 1)>

			<!---
				Only one value was requested, so just return
				the first value of our values array.
			--->
			<cfreturn LOCAL.Values[ 1 ] />

		<cfelse>

			<!--- Return the entire value array. --->
			<cfreturn LOCAL.Values />

		</cfif>

	</cffunction>


	<cffunction
		name="Size"
		access="public"
		returntype="numeric"
		output="false"
		hint="Returns the number of random value remaining.">

		<!---
			Since our global index is keeping track of the
			last value index that was returned, we just need
			to find the difference between this index and
			the size of the data array.
		--->
		<cfreturn (ArrayLen( VARIABLES.Instance.Data ) - VARIABLES.Instance.Index)/>
	</cffunction>

</cfcomponent>

This methods does a lot of work up front to shuffle the array, but then, it requires no calculations for subsequent value selections.

Select And Track

This algorithm works by keeping an index of selected index values. For every selection that gets made, it chooses a random index and checks to see if it has already been selected (and if so, it will continue to select random indexes until an unused one is found):

<cfcomponent
	hint="Returns random elements from an array.">

	<!---
		Run the pseudo constructor to set up default
		values and data structures.
	--->

	<!---
		Set up an instance strucure to hold our
		instance-related data.
	--->
	<cfset VARIABLES.Instance = StructNew() />

	<!---
		Create a default data array. This will be
		overridden by the constructor.
	--->
	<cfset VARIABLES.Instance.Data = ArrayNew( 1 ) />

	<!---
		This will keep track of which indexes we have
		already used. This Struct will act as a look
		up when getting new index values.
	--->
	<cfset VARIABLES.Instance.IndexLookUp = StructNew() />


	<cffunction
		name="Init"
		access="public"
		returntype="any"
		output="false"
		hint="Returns an initialized component with the given data.">

		<!--- Define arguments. --->
		<cfargument
			name="Data"
			type="array"
			required="true"
			hint="The array from which we will be getting random data."
			/>


		<!--- Store the passed in array. --->
		<cfset VARIABLES.Instance.Data = ARGUMENTS.Data />

		<!--- Return This reference. --->
		<cfreturn THIS />
	</cffunction>


	<cffunction
		name="Get"
		access="public"
		returntype="any"
		output="false"
		hint="Returns either a single value or an array of values (if you pass the optional argument for number of values to be returned).">

		<!--- Define arguments. --->
		<cfargument
			name="Count"
			type="numeric"
			required="false"
			default="1"
			hint="This is the number of values to be returned. If only one value is requested, a value will be returned. If more than one value is requested, an array of values will be returned."
			/>

		<!--- Set up local scope. --->
		<cfset var LOCAL = StructNew() />


		<!---
			Check to see if we have enough values to
			return a full value array.
		--->
		<cfif (ArrayLen( VARIABLES.Instance.Data ) LT (ARGUMENTS.Count + StructCount( VARIABLES.Instance.IndexLookUp )))>

			<!--- Throw an error. --->
			<cfthrow
				message="You have requested more values than are available in the original array."
				type="OutOfBounds.Exception"
				/>

		</cfif>


		<!--- Set up the array to return. --->
		<cfset LOCAL.Values = ArrayNew( 1 ) />

		<!---
			Loop over the value count to populate the return
			value array. Even if the user has requested only one
			value, we are going to build an array for multiple
			values. This will reduce our overall logic.
		--->
		<cfloop
			index="LOCAL.ValueIndex"
			from="1"
			to="#ARGUMENTS.Count#"
			step="1">

			<!---
				Keep looping while that index has not yet been
				chosen. We will know if this index has been used
				if it is in the index loop up. The loop will be
				broken manually once an unused index is found.
			--->
			<cfloop condition="true">

				<!--- Pick a random index out of the array. --->
				<cfset LOCAL.Index = RandRange(
					1,
					ArrayLen( VARIABLES.Instance.Data )
					) />


				<!---
					Check to see if that index has already been
					used by a previous selection.
				--->
				<cfif NOT StructKeyExists(
					VARIABLES.Instance.IndexLookUp,
					LOCAL.Index
					)>

					<!---
						This is new. Add it to the index look
						up structure.
					--->
					<cfset VARIABLES.Instance.IndexLookUp[ LOCAL.Index ] = true />

					<!--- Break out of this loop. --->
					<cfbreak />

				</cfif>

			</cfloop>


			<!--- Add this value to the return value array. --->
			<cfset ArrayAppend(
				LOCAL.Values,
				VARIABLES.Instance.Data[ LOCAL.Index ]
				) />

		</cfloop>


		<!---
			When returning the values, we do not want to return
			the array if only a single element was requested.
			Check to see what the original count request was.
		--->
		<cfif (ARGUMENTS.Count EQ 1)>

			<!---
				Only one value was requested, so just return
				the first value of our values array.
			--->
			<cfreturn LOCAL.Values[ 1 ] />

		<cfelse>

			<!--- Return the entire value array. --->
			<cfreturn LOCAL.Values />

		</cfif>

	</cffunction>


	<cffunction
		name="Size"
		access="public"
		returntype="numeric"
		output="false"
		hint="Returns the number of random value remaining.">

		<!---
			Since our index look up struct is keeping
			track of all the indexes already uses, then
			we can figure out how many we have left by
			subtracting the size of the look up from the
			original data array.
		--->
		<cfreturn (
			ArrayLen( VARIABLES.Instance.Data ) -
			StructCount( VARIABLES.Instance.IndexLookUp )
			) />
	</cffunction>

</cfcomponent>

This algorithm does no work up front, but it does increasingly more work (potentially) for every subsequent value selection. The reason we are doing this is to see if the ArrayDelete() in the first algorithm comes at a heavier cost than the repeated random selection and index look ups.

Ok, so now that we have our algorithms down, it's time to test them. Before we do anything at all, we have to test to make sure that our algorithms work without regard to efficientcy. Therefore, we must run some very small use cases:

<!---
	First we are going to start off with a very small
	array. This is going to be our proof of concept
	array to make sure that all of our algorithms work
	(Before we test for efficiency).
--->
<cfset arrData = ListToArray(
	"1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20"
	) />


<!---
	Loop over the three types of array randomizers to
	see if they all produce random values.
--->
<cfloop
	index="intType"
	from="1"
	to="3"
	step="1">

	<!--- Create the array randomizer of this type. --->
	<cfset objRandArray = CreateObject(
		"component",
		"ArrayRandomizer#intType#"
		).Init(
			arrData
			) />


	<h2>
		Array Randomizer #intType#
	</h2>

	<!---
		Keep looping until we have gotten every unique
		but random index/value out of the array.
	--->
	<cfloop
		condition="objRandArray.Size() GT 5">

		[ #objRandArray.Get()# ]

	</cfloop>

	<!---
		Demonstrate that each array randomizer can
		also return an array or random values.
	--->
	<cfdump var="#objRandArray.Get( 5 )#" />

</cfloop>

In the previous test, we were just making sure that there were no blatant errors. We were also checking the Get() method to see if it could handle single and multiple value selects (not required in the quiz, but a cool feature nonetheless). These worked fine (and I will spare you the output as this is a rather long post).

Now that we know these algorithms work in so much as they do what they were defined to do, we can now test to see how they perform. First, we will test "worst case scenarios" in which we have to select N random values from an array of length N. And, since ColdFusion is sooo awesome, we have to use a fairly large array to even see time differences:

<!---
	At this point we have proved that our array randomizers
	do indeed work for N values (where N was an arbitrary
	20). Now that we know that the algorithms work, we can
	start to test for efficiency. There are two things we
	can test for:

	1. Worst case scenario - what happens if we need to get
	ALL N random index values.

	2. Better case scendario - what happens if we need to
	get only a small subset of all the random index values.
--->


<!---
	To help test the worst case scenario, let's create a
	very huge array. This array will have values that
	repeat. Since we know that our algorithms work properly,
	we don't need to check the values... we just need to
	see what performs faster.

	This will create an array of length: 5,000.
--->
<cfset arrData = ListToArray(
	RepeatString(
		"0,1,2,3,4,5,6,7,8,9,",
		500
		)
	) />


<!---
	Loop over the three types of array randomizers see
	how fast each of them can get all the random values.
--->
<cfloop
	index="intType"
	from="1"
	to="3"
	step="1">

	<!--- Create the array randomizer of this type. --->
	<cfset objRandArray = CreateObject(
		"component",
		"ArrayRandomizer#intType#"
		).Init(
			arrData
			) />


	<h2>
		Array Randomizer #intType#
	</h2>


	<cftimer
		label="Array Randomize #intType#" type="outline">

		<!---
			Keep looping until we have gotten every unique
			but random index/value out of the array. We are
			going to wrap this in CFSilent tags since we
			don't want to waste our white space.
		--->
		<cfsilent>

			<cfloop
				condition="objRandArray.Size()">

				<!---
					Just get the value, but don't do
					anything with it.
				--->
				<cfset strValue = objRandArray.Get() />

			</cfloop>

		</cfsilent>

		DONE.

	</cftimer>

</cfloop>

Running the above tests, we started to see a clear trend in seconds to execute:

ArrayRandomizer1 (Select and Delete)

10.8, 10.7, 10.8, 11.5, 10.8
Average: 10.9

ArrayRandomizer2 (Shuffle and Select)

11.7, 10.8, 10.8, 10.8, 10.8
Average: 11.0

ArrayRandomizer3 (Select and Track)

13.0, 12.8, 13.5, 12.9, 13.5
Average: 13.1

I have to say that I was blown away by the performance of all three of these algorithms. The first two trended to be slightly better, but all three were lighting fast. I was also blown away that the up front Shuffle() method was just as fast as first method that merely deleted selected values. I had no idea that java.util.Collections Shuffle() method was so freakin' efficient.

In all reality, though, we rarely ever have to deal with worst case scenarios. So, for the final test, I wanted to test a "better case scenario" in which we were selecting multiple values from a massive array, but no where near the number of possible values:

<!---
	Now, we will create an even bigger array but only get
	a smaller subset of the index values. We are going
	this because ArrayRandomizer2 does more work up-front
	where as the other do more work as you go.

	This will create an array of length: 10,000.
--->
<cfset arrData = ListToArray(
	RepeatString(
		"0,1,2,3,4,5,6,7,8,9,",
		1000
		)
	) />


<!---
	Loop over the three types of array randomizers see
	how fast each of them can get all the random values.
--->
<cfloop
	index="intType"
	from="1"
	to="3"
	step="1">

	<!--- Create the array randomizer of this type. --->
	<cfset objRandArray = CreateObject(
		"component",
		"ArrayRandomizer#intType#"
		).Init(
			arrData
			) />


	<h2>
		Array Randomizer #intType#
	</h2>


	<cftimer
		label="Array Randomize #intType#" type="outline">

		<!---
			Only get the first 10 values out of the array.
			We want to test the better case speed.
		--->
		<cfloop
			index="intCounter"
			from="1"
			to="10"
			step="1">

			[ #objRandArray.Get()# ]

		</cfloop>

		DONE.

	</cftimer>

</cfloop>

Running the above test, we started to see a clear trend in milliseconds to execute:

ArrayRandomizer1 (Select and Delete)

15, 16, 31, 15, 00
Average: 15.4

ArrayRandomizer2 (Shuffle and Select)

00, 31, 16, 16, 15
Average: 15.6

ArrayRandomizer3 (Select and Track)

16, 15, 00, 00, 00
Average: 6.2

Again, all three are lighting fast, but it seems for a better/best case scenario, the Select and Track method was the best. That makes sense as the third algorithm does no up front work and does very little work early in the value selection process. I was again blown away at how fast an array of 10,000 elements can be shuffled (algorithm 2).

So in conclusion, it makes little to no difference which algorithm you use. I like the Shuffle() method, but I think that is more of a cool factor than anything else.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/724

Reader Comments

Joe Zack May 23, 2007 at 2:47 PM

2 Comments

Really enjoyed this post. got me thinking!

Ben Nadel May 23, 2007 at 2:51 PM

16,233 Comments

Thanks Joe, I am glad to hear it.

Matt Woodward May 23, 2007 at 4:58 PM

1 Comments

Hi Ben--sorry about providing yours as a zip, but it seemed like the expeditious thing to do at 2 a.m. :-) I hope that doesn't give you a disadvantage.

Ben Nadel May 23, 2007 at 5:11 PM

16,233 Comments

@Matt,

No worries. I was joking about the disadvantage - I actually put that in during the proof read, not in the original go at the blog entry :) It's just a fun contest. Coming up with a solution is more fun that winning. Plus, look at this blog entry.... how could have possible put that up on your site :)

Back pack May 24, 2007 at 11:17 AM

1 Comments

Ben opinions coincide for us :)

Satellite May 31, 2007 at 8:02 PM

9 Comments

who did have such question interestingly? to decide him by such method! :)

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.