Randomly Selected vs. Evenly Distributed

Posted July 19, 2007 at 8:37 AM by Ben Nadel

In the past, I have complained about random number generation in ColdFusion. Others have also complained a bit about the randomness (or lack thereof) of my Use It Or Lose It emails. Outside of programming, I find random selection to be frustrating as well; my iPod seems to replay some songs a lot and NEVER chooses other ones. My WimAmp does the same thing. But this got me thinking; what's really bothering me? Is it random selection? Or is the lack of even distribution? Are the two independent, or can you not even consider one without the other?

Take, for example, the answer I posted for the ColdFusion Weekly Podcast on selecting random elements from an array. In that solution, the value that we were returning was randomly selected, but it was selected in such a way that no indexes were repeated until the whole array was exhausted. Is this really random selection? Yes, in terms of how each index was selected (random number generation), but really no, because each selection was influenced by the previous N selections. But is it an evenly distributed selection? Yes, each value in the array is selected once without repetition, therefore distributing our selection across the entire array.

In truly random number generation, each randomly generated number should be a completely independent entity, having nothing to do with any other number generating actions. Therefore, if we were randomly selecting values between 1 and 10, it is possible (but unlikely) to randomly select 5 ten thousand times in a row. But, when it comes to system interaction, just because something like this is random, it does not mean that it is a satisfactory outcome. I think this is because when people "say" random, what they "mean" is random AND evenly distributed.

The problem with random AND evenly distributed selections is really that they are no longer random and it requires some sort of data persistence. How does one go about doing this in a way that is more pleasing to the user without adding the overhead of complexity?

I have no idea, this is more of a rambling and a plea for suggestions.




Reader Comments

Jul 19, 2007 at 9:43 AM // reply »
42 Comments

If you need a greater randomness you can use an alternate random number algorithm (mx7):

randRange(1,10,'SHA1PRNG')

The docs even state: "SHA1PRNG: generates a number using the Sun Java SHA1PRNG algorithm. This algorithm provides greater randomness than the default algorithm"

http://livedocs.adobe.com/coldfusion/7/htmldocs/00000605.htm


db
Jul 19, 2007 at 9:47 AM // reply »
3 Comments

here's something to get you started -
http://www.coldfusionjedi.com/index.cfm/2006/8/16/ColdFusion-101-Picking-a-random-image-2


Jul 19, 2007 at 9:57 AM // reply »
110 Comments

I think the way I've handled it previously is to keep a running tab of all of the "randomly" selected values. Then run a comparison of the newly selected items, check them against the table, rinse-and-repeat until you reach however many you are looking for. As you get more data in there, it can slow the process down a bit though (as it's running more and more comparisons), but as it's a scheduled task and I'll never see it, I've never worried *too* much about the slowdown.

If you're using SQL, you could see if that returns a more evenly distributed random result
http://www.sqlteam.com/article/using-newid-to-randomly-sort-records


Jul 19, 2007 at 9:58 AM // reply »
42 Comments

While on the topic of randomness. If you're trying to retrieve random items from a database and you're using MS SQL, the NewID() function is a good method of doing it.

If you want all:
SELECT ID
FROM FOO
ORDER BY NEWID()

If you only need (n):
SELECT TOP 1 ID
FROM FOO
ORDER BY NEWID()


Jul 19, 2007 at 10:01 AM // reply »
11,243 Comments

@Dustin,

Thanks, I will check that out.

@DB,

That is a nice tutorial, but there he is storing things in the session. I am doing things that do not involve session and would be spread out over a time period that is longer than a session would last.


Jul 19, 2007 at 10:07 AM // reply »
8 Comments

Tecnhically speaking, there's no such thing as random in a computer. There are only calculations based on values that are difficult for a human to predict.

In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause.

But you're right in that randomness is not equivalent to even distribution, though as the number or iterations approaches infinity, they begin to resemble each other.

P.S. I'm not really this high-brow. I'm just having a moment.


Jul 20, 2007 at 1:06 AM // reply »
10 Comments

"In fact, there are lines of thought that suggest that randomness is a myth in any scenario, computed or not. i.e. everything is caused by something, so there's nothing that's truly random - only things that appear to be random, because we cannot readily discern the cause."

Maybe, until you look close enough that you are in the size range where quantum physics becomes dominant, where things are truly random to the point of being completely unpredictable no matter how good a measurement is.

http://en.wikipedia.org/wiki/Hardware_random_number_generator has a good summary.


Jul 20, 2007 at 7:03 AM // reply »
11,243 Comments

I should really go back and watch Jurassic Park :) I think Jeff Goldbum's character had a whole thing about Chaos Theory and random events. Plus I like watching movies.


Jul 20, 2007 at 7:13 AM // reply »
10 Comments

Here you go:

http://www.youtube.com/watch?v=dpDckbqhpW8


Jul 20, 2007 at 7:21 AM // reply »
11,243 Comments

Ha ha, that video is a little disturbing :)


Jul 20, 2007 at 12:52 PM // reply »
3 Comments

Computer generated randomness is not random, but rather Pseudorandom Per http://en.wikipedia.org/wiki/Pseudorandom:

"A pseudorandom process is a process that appears random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process. Such a process is easier to produce than a genuine random one, and has the benefit that it can be used again and again to produce exactly the same numbers, useful for testing and fixing software.

To date there is no known method to produce true randomness, because due to the very nature of randomness, any factor determining the outcome would mean that it is not random at all. The random number generation functions provided in many software packages are pseudorandom."

Quoting wikipedia because its convenient, not because its an authority on the subject.


Jul 20, 2007 at 2:12 PM // reply »
8 Comments

I think there's a whole fascinating discussion to be had on whether or not randomness exists, though that's not the intent of the thread.

More relevant is the notion of testing for randomness, which - by any definition - is impossible. If randomness means unpredictability, than any test that PREDICTS the outcome, even the nature of the outcome - more proves unrandomness than randomness. Going with the analogy of a million monkeys on a million typewriters eventually producing the works of shakespeare. That outcome - the works of shakespeare -would undoubtedly fail any tests for randomness, even though they were "randomly" generated.

Watchout! The gravitaional pull is pulling me in!

I once saw an episode of Numb3ers where the genius character was demonstrating randomness. He asked a group of 10 or so people to take up random positions in the room. Once everyone did, he pointed out that it was not random at all. Everyone was about the same distance from each other (evenly distributed), and said that if their positions were truly random you might have clusters of 2 or three people occupying the same area, or close to it.


Jul 20, 2007 at 2:23 PM // reply »
11,243 Comments

I'm waiting for someone to switch over to talking about the philosophy of free will ;)


Jul 23, 2007 at 8:16 AM // reply »
11,243 Comments

@Dustin,

Thanks for the tip on SHA1PRNG. I just did some testing:

http://www.bennadel.com/index.cfm?dax=blog:852.view

And it seems that SHA1PRNG does produce a more even distribution of evenly timed, randomly generated numbers.



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 22, 2013 at 5:35 PM
Script Tags, jQuery, And Html(), Text() And Contents()
This is still an issue 2 years later. jQuery is supposed to remediate these cross browser issues, no? I have been unable to find any statement from the jQuery team calling this behavior "by de ... read »
May 22, 2013 at 12:44 PM
Ask Ben: Query Loop Inside CFScript Tags
In cf10, if you call a function that has: local.result = {}; local.result.msg = ""; local.svc = new query(); local.svc.setSQL("SELECT * FROM..."); local.obj = local.svc.exe ... read »
May 22, 2013 at 12:29 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben: What version of Java are you using? Also, did you test users.id to see what Java reports as the data type? I wonder if it's not a Java primitive data type, but getting returned as something ... read »
May 22, 2013 at 11:47 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dana, Awesome - so it looks like this bug was fixed in ColdFusion 10. Thanks so much for double-checking that. ... read »
May 22, 2013 at 11:37 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
When I c&p and run on cf10, I get: Selected User IDs: 1,4 User 1 selected: YES - YES User 2 selected: NO - NO User 3 selected: NO - NO User 4 selected: YES - YES User 5 selected: NO - ... read »
May 22, 2013 at 11:27 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Tom, Good thought, but no dice. Both of these still exhibit the same behavior: users.id[ users.currentRow ] users[ "id" ][ users.currentRow ] It's just something whacky happening with ... read »
May 22, 2013 at 11:07 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
Could your problem be that "users.id" is actually an ARRAY, not a single value? Perhaps try it again with "users.id[1]" (I only have CF8 here at work). ... read »
May 22, 2013 at 7:52 AM
Nested Views, Routing, And Deep Linking With AngularJS
Hi, Just a quick thank you. As it happens, for my own purposes, the pending ui-router work being done in native angular is likely the one I'll adopt, but your exploration, code and documentation of ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools