Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2013 (Bloomington, MN) with:

Seeding SQL RAND() Method With NEWID() For Per-Row Random Values (Thanks Joshua Cyr)

By Ben Nadel on
Tags: SQL

From what I have read and from what I've seen demos of, I thought that MS SQL's RAND() method could only return one random value per connection. As I never looked at the documentation for RAND(), I never realized that it could be seeded. This morning, in my post on selecting random records from a weighted data table, Joshua Cyr pointed me to a great blog post demonstrating how to get RAND() to produce random numbers for each record in a query.

Before we see that, however, let's take a look at how it won't work:

  • <!--- Select random numbers. --->
  • <cfquery name="qRandom" datasource="#REQUEST.DSN#">
  • SELECT
  • p.id,
  • ( RAND() ) AS random_number
  • FROM
  • pivot1000 p
  • WHERE
  • p.id < 10
  • ORDER BY
  • p.id ASC
  • </cfquery>
  •  
  •  
  • <!--- Output the random numbers generated. --->
  • <cfloop query="qRandom">
  •  
  • #qRandom.random_number#<br />
  •  
  • </cfloop>

Notice here that we are simply selecting RAND() as a column in the table, then output that column. When we run this code, we get the following:

0.625088599784
0.625088599784
0.625088599784
0.625088599784
0.625088599784
0.625088599784
0.625088599784
0.625088599784
0.625088599784

Each call to RAND() produces the same result. This is because the RAND() method is implicitly seeded on the first call and then each subsequent call in the same connection uses the same seed value. To get a new random value for each RAND() method call, we have to pass it a new seed value each time.

The seed value must be numeric. Now, in the past, I've used NEWID() to create random numbers; but again, I never tried this on a per-row basis. As demonstrated in Josh's link, the NEWID() can be used to seed the RAND() method call every time it is run:

  • <!--- Select random numbers. --->
  • <cfquery name="qRandom" datasource="#REQUEST.DSN#">
  • SELECT
  • p.id,
  •  
  • <!--- Seed RAND() with binary version of NEWID(). --->
  • (
  • RAND(
  • CAST( NEWID() AS varbinary )
  • )
  • ) AS random_number
  • FROM
  • pivot1000 p
  • WHERE
  • p.id < 10
  • ORDER BY
  • p.id ASC
  • </cfquery>
  •  
  •  
  • <!--- Output the random numbers generated. --->
  • <cfloop query="qRandom">
  •  
  • #qRandom.random_number#<br />
  •  
  • </cfloop>

As I have stated before, I still don't know what the conversion from UUID to VARBINARY does exactly, but it produces are binary data structure that then get implicitly cast to a numeric value when passed to the RAND() method. As a result, this seeds the RAND() method with a new value on each row and gives us the following output:

0.632351429673
0.277469709628
0.291371676631
0.187895386945
0.0332443201662
0.269721718406
0.801980914521
0.984558916944
0.569620907931

A random value for each record. Good stuff! Thanks a lot Josh!




Reader Comments

just in case you don't already know you can also use NEWID() to return random records from a table by using it in the ORDER BY clause:

ORDER BY NEWID()

This is awesome - most of the other details I read were all about creating functions and views, which is cool if you actually have that kind of access to the database. This helped me de-identify some Personal Information data by randomly adding/subtracting somewhere between 1-365 days to the date of birth. Thanks! :)

The reason that this works is that Rand() is seeded with an integer. What is being returned is the first number in the series for that seed. If NewID()'s universe of returned values encapsulates all of T-SQL's Integers (from -2,147,483,648 through 2,147,483,647) then the solution provided above will return 4,294,967,296 unique outcomes. For example, let's say that NewID() returns a 500 and then Rand(500) is your selected column and it equals .12345 then It will always be the same, .12345 when NewID() returns 500. If that 4 and a quarter billion number granularity is good for your purposes then this method will work.

I haven't done a study on the outcome of this set of numbers but it's likely that the numerical set returned skews either above or below 0.5 (probably not by much) and your results will reflect that.