Randomly Selected vs. Evenly Distributed
Posted July 19, 2007 at 8:37 AM by Ben Nadel
In the past, I have complained about random number generation in ColdFusion. Others have also complained a bit about the randomness (or lack thereof) of my Use It Or Lose It emails. Outside of programming, I find random selection to be frustrating as well; my iPod seems to replay some songs a lot and NEVER chooses other ones. My WimAmp does the same thing. But this got me thinking; what's really bothering me? Is it random selection? Or is the lack of even distribution? Are the two independent, or can you not even consider one without the other?
Take, for example, the answer I posted for the ColdFusion Weekly Podcast on selecting random elements from an array. In that solution, the value that we were returning was randomly selected, but it was selected in such a way that no indexes were repeated until the whole array was exhausted. Is this really random selection? Yes, in terms of how each index was selected (random number generation), but really no, because each selection was influenced by the previous N selections. But is it an evenly distributed selection? Yes, each value in the array is selected once without repetition, therefore distributing our selection across the entire array.
In truly random number generation, each randomly generated number should be a completely independent entity, having nothing to do with any other number generating actions. Therefore, if we were randomly selecting values between 1 and 10, it is possible (but unlikely) to randomly select 5 ten thousand times in a row. But, when it comes to system interaction, just because something like this is random, it does not mean that it is a satisfactory outcome. I think this is because when people "say" random, what they "mean" is random AND evenly distributed.
The problem with random AND evenly distributed selections is really that they are no longer random and it requires some sort of data persistence. How does one go about doing this in a way that is more pleasing to the user without adding the overhead of complexity?
I have no idea, this is more of a rambling and a plea for suggestions.
What Other People Are Searching For
If you need a greater randomness you can use an alternate random number algorithm (mx7):
The docs even state: "SHA1PRNG: generates a number using the Sun Java SHA1PRNG algorithm. This algorithm provides greater randomness than the default algorithm"
here's something to get you started -
I think the way I've handled it previously is to keep a running tab of all of the "randomly" selected values. Then run a comparison of the newly selected items, check them against the table, rinse-and-repeat until you reach however many you are looking for. As you get more data in there, it can slow the process down a bit though (as it's running more and more comparisons), but as it's a scheduled task and I'll never see it, I've never worried *too* much about the slowdown.
If you're using SQL, you could see if that returns a more evenly distributed random result
While on the topic of randomness. If you're trying to retrieve random items from a database and you're using MS SQL, the NewID() function is a good method of doing it.
If you want all:
ORDER BY NEWID()
If you only need (n):
SELECT TOP 1 ID
ORDER BY NEWID()
Thanks, I will check that out.
That is a nice tutorial, but there he is storing things in the session. I am doing things that do not involve session and would be spread out over a time period that is longer than a session would last.
Tecnhically speaking, there's no such thing as random in a computer. There are only calculations based on values that are difficult for a human to predict.
In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause.
But you're right in that randomness is not equivalent to even distribution, though as the number or iterations approaches infinity, they begin to resemble each other.
P.S. I'm not really this high-brow. I'm just having a moment.
"In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause."
Maybe, until you look close enough that you are in the size range where quantum physics becomes dominant, where things are truly random to the point of being completely unpredictable no matter how good a measurement is.
http://en.wikipedia.org/wiki/Hardware_random_number_generator has a good summary.
I should really go back and watch Jurassic Park :) I think Jeff Goldbum's character had a whole thing about Chaos Theory and random events. Plus I like watching movies.
Here you go:
Ha ha, that video is a little disturbing :)
Computer generated randomness is not random, but rather Pseudorandom Per http://en.wikipedia.org/wiki/Pseudorandom:
"A pseudorandom process is a process that appears random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuine random one, and has the benefit that it can be used again and again to produce exactly the same numbers, useful for testing and fixing software.
To date there is no known method to produce true randomness, because due to the very nature of randomness, any factor determining the outcome would mean that it is not random at all. The random number generation functions provided in many software packages are pseudorandom."
Quoting wikipedia because its convenient, not because its an authority on the subject.
I think there's a whole fascinating discussion to be had on whether or not randomness exists, though that's not the intent of the thread.
More relevant is the notion of testing for randomness, which - by any definition - is impossible. If randomness means unpredictability, than any test that PREDICTS the outcome, even the nature of the outcome - more proves unrandomness than randomness. Going with the analogy of a million monkeys on a million typewriters eventually producing the works of shakespeare. That outcome - the works of shakespeare -would undoubtedly fail any tests for randomness, even though they were "randomly" generated.
Watchout! The gravitaional pull is pulling me in!
I once saw an episode of Numb3ers where the genius character was demonstrating randomness. He asked a group of 10 or so people to take up random positions in the room. Once everyone did, he pointed out that it was not random at all. Everyone was about the same distance from each other (evenly distributed), and said that if their positions were truly random you might have clusters of 2 or three people occupying the same area, or close to it.
I'm waiting for someone to switch over to talking about the philosophy of free will ;)
Thanks for the tip on SHA1PRNG. I just did some testing:
And it seems that SHA1PRNG does produce a more even distribution of evenly timed, randomly generated numbers.