Randomly Selected vs. Evenly Distributed

Posted July 19, 2007 at 8:37 AM

In the past, I have complained about random number generation in ColdFusion. Others have also complained a bit about the randomness (or lack thereof) of my Use It Or Lose It emails. Outside of programming, I find random selection to be frustrating as well; my iPod seems to replay some songs a lot and NEVER chooses other ones. My WimAmp does the same thing. But this got me thinking; what's really bothering me? Is it random selection? Or is the lack of even distribution? Are the two independent, or can you not even consider one without the other?

Take, for example, the answer I posted for the ColdFusion Weekly Podcast on selecting random elements from an array. In that solution, the value that we were returning was randomly selected, but it was selected in such a way that no indexes were repeated until the whole array was exhausted. Is this really random selection? Yes, in terms of how each index was selected (random number generation), but really no, because each selection was influenced by the previous N selections. But is it an evenly distributed selection? Yes, each value in the array is selected once without repetition, therefore distributing our selection across the entire array.

In truly random number generation, each randomly generated number should be a completely independent entity, having nothing to do with any other number generating actions. Therefore, if we were randomly selecting values between 1 and 10, it is possible (but unlikely) to randomly select 5 ten thousand times in a row. But, when it comes to system interaction, just because something like this is random, it does not mean that it is a satisfactory outcome. I think this is because when people "say" random, what they "mean" is random AND evenly distributed.

The problem with random AND evenly distributed selections is really that they are no longer random and it requires some sort of data persistence. How does one go about doing this in a way that is more pleasing to the user without adding the overhead of complexity?

I have no idea, this is more of a rambling and a plea for suggestions.

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page





Reader Comments

Jul 19, 2007 at 9:43 AM // reply »
42 Comments

If you need a greater randomness you can use an alternate random number algorithm (mx7):

randRange(1,10,'SHA1PRNG')

The docs even state: "SHA1PRNG: generates a number using the Sun Java SHA1PRNG algorithm. This algorithm provides greater randomness than the default algorithm"

http://livedocs.adobe.com/coldfusion/7/htmldocs/00000605.htm


db
Jul 19, 2007 at 9:47 AM // reply »
3 Comments

here's something to get you started -
http://www.coldfusionjedi.com/index.cfm/2006/8/16/ColdFusion-101-Picking-a-random-image-2


Jul 19, 2007 at 9:57 AM // reply »
102 Comments

I think the way I've handled it previously is to keep a running tab of all of the "randomly" selected values. Then run a comparison of the newly selected items, check them against the table, rinse-and-repeat until you reach however many you are looking for. As you get more data in there, it can slow the process down a bit though (as it's running more and more comparisons), but as it's a scheduled task and I'll never see it, I've never worried *too* much about the slowdown.

If you're using SQL, you could see if that returns a more evenly distributed random result
http://www.sqlteam.com/article/using-newid-to-randomly-sort-records


Jul 19, 2007 at 9:58 AM // reply »
42 Comments

While on the topic of randomness. If you're trying to retrieve random items from a database and you're using MS SQL, the NewID() function is a good method of doing it.

If you want all:
SELECT ID
FROM FOO
ORDER BY NEWID()

If you only need (n):
SELECT TOP 1 ID
FROM FOO
ORDER BY NEWID()


Jul 19, 2007 at 10:01 AM // reply »
6,516 Comments

@Dustin,

Thanks, I will check that out.

@DB,

That is a nice tutorial, but there he is storing things in the session. I am doing things that do not involve session and would be spread out over a time period that is longer than a session would last.


Jul 19, 2007 at 10:07 AM // reply »
8 Comments

Tecnhically speaking, there's no such thing as random in a computer. There are only calculations based on values that are difficult for a human to predict.

In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause.

But you're right in that randomness is not equivalent to even distribution, though as the number or iterations approaches infinity, they begin to resemble each other.

P.S. I'm not really this high-brow. I'm just having a moment.


Jul 20, 2007 at 1:06 AM // reply »
10 Comments

"In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause."

Maybe, until you look close enough that you are in the size range where quantum physics becomes dominant, where things are truly random to the point of being completely unpredictable no matter how good a measurement is.

http://en.wikipedia.org/wiki/Hardware_random_number_generator has a good summary.


Jul 20, 2007 at 7:03 AM // reply »
6,516 Comments

I should really go back and watch Jurassic Park :) I think Jeff Goldbum's character had a whole thing about Chaos Theory and random events. Plus I like watching movies.


Jul 20, 2007 at 7:13 AM // reply »
10 Comments

Here you go:

http://www.youtube.com/watch?v=dpDckbqhpW8


Jul 20, 2007 at 7:21 AM // reply »
6,516 Comments

Ha ha, that video is a little disturbing :)


Jul 20, 2007 at 12:52 PM // reply »
3 Comments

Computer generated randomness is not random, but rather Pseudorandom Per http://en.wikipedia.org/wiki/Pseudorandom:

"A pseudorandom process is a process that appears random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuine random one, and has the benefit that it can be used again and again to produce exactly the same numbers, useful for testing and fixing software.

To date there is no known method to produce true randomness, because due to the very nature of randomness, any factor determining the outcome would mean that it is not random at all. The random number generation functions provided in many software packages are pseudorandom."

Quoting wikipedia because its convenient, not because its an authority on the subject.


Jul 20, 2007 at 2:12 PM // reply »
8 Comments

I think there's a whole fascinating discussion to be had on whether or not randomness exists, though that's not the intent of the thread.

More relevant is the notion of testing for randomness, which - by any definition - is impossible. If randomness means unpredictability, than any test that PREDICTS the outcome, even the nature of the outcome - more proves unrandomness than randomness. Going with the analogy of a million monkeys on a million typewriters eventually producing the works of shakespeare. That outcome - the works of shakespeare -would undoubtedly fail any tests for randomness, even though they were "randomly" generated.

Watchout! The gravitaional pull is pulling me in!

I once saw an episode of Numb3ers where the genius character was demonstrating randomness. He asked a group of 10 or so people to take up random positions in the room. Once everyone did, he pointed out that it was not random at all. Everyone was about the same distance from each other (evenly distributed), and said that if their positions were truly random you might have clusters of 2 or three people occupying the same area, or close to it.


Jul 20, 2007 at 2:23 PM // reply »
6,516 Comments

I'm waiting for someone to switch over to talking about the philosophy of free will ;)


Jul 23, 2007 at 8:16 AM // reply »
6,516 Comments

@Dustin,

Thanks for the tip on SHA1PRNG. I just did some testing:

http://www.bennadel.com/index.cfm?dax=blog:852.view

And it seems that SHA1PRNG does produce a more even distribution of evenly timed, randomly generated numbers.


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »
Nov 21, 2009 at 4:49 PM
Styling The ColdFusion 8 WriteToBrowser CFImage Output
Great work yet again Ben! Whilst I didn't use this whole code, I copied some of your regex code for a similar problem with the lack of an alt attribute and unescaped ampersands in CFIMAGE for Railo 3 ... read »
Nov 21, 2009 at 1:13 PM
My First ColdFusion Builder Extension - Encrypting And Decrypting CFM / CFC Files
@Ben, Because I am pedantic, I just want to make sure that everyone knows there is absolutely no encryption going on. There is only encoding and obfuscation. The cfencode tool only obfuscates your C ... read »
Nov 21, 2009 at 12:28 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jody I can't seem to get your code sample to work. If you are still having problems, try this code out and see if it gets you what you wanted. <!--- Comma delimited list with various duplicates ... read »
Nov 21, 2009 at 11:03 AM
Groovy Operator Overloading Does Not Work In The ColdFusion Context
Hi Ben, Thanks for this informative post. Now I am reading ur old posts too ... read »
Nov 21, 2009 at 10:56 AM
HostMySite.com Has The Best ColdFusion Hosting
@Mehul, Yes very nice people, however several downtimes per day which was not acceptable. Hence we had to move out. I am glad you are having good luck with them so far. ... read »