Using java.util.Collections To Shuffle A ColdFusion Query Column Corrupts Column Values
The other day, I wanted to see if I could use the java.util.Collections class to help shuffle the values in a ColdFusion query. As I've blogged about before, the Collections class presents a shuffle() method which can be used to randomly sort Java collections (ie. ColdFusion arrays). And, since ColdFusion query objects represent columns as collections, I wanted to see if the Collections class could shuffle the row values in a ColdFusion query column. Unfortunately, using shuffle() on a query column appears to corrupt the row values with each pass.
To demonstrate this corruption, I'm going to build a ColdFusion query object with a single column that has numeric values. Then, I'm going to loop 5 times, passing the ColdFusion query column to the shuffle() method with each iteration:
<!--- Create a query that we will sort. ---> <cfset values = queryNew( "" ) /> <!--- Add the ID column. ---> <cfset queryAddColumn( values, "id", "cf_sql_integer", listToArray( "1,2,3,4,5" ) ) /> <!--- Loop a few times to shuffle the column. ---> <cfloop index="i" from="1" to="5" step="1"> <!--- Shuffle the column. NOTE: I am using BRACKET notation so as to refer to the column as a collection (rather than just the first value in the column). ---> <cfset createObject( "java", "java.util.Collections" ).shuffle( values[ "id" ] ) /> <!--- Output the shuffled query column. ---> <cfdump var="#values#" label="Shuffle (#i#)" /> <br /> </cfloop>
Notice that when we pass the ColdFusion query column to the shuffle() method, we have to use the bracket notation:
values[ "id" ]
This allows us to refer to the column as a collection, not just the first row value (as would be denoted using the dot-notation).
When we run the above code, we get the following page output:
Notice that with each pass to the shuffle() method, the query column becomes more and more corrupt. But, it does so in a very specific way: the first row value is randomly duplicated within the column. And, while it might not be obvious from the given output, the first row value never changes. It is always "1". No matter how many times I refresh the demo, the first row is always 1. I suspect this is why the duplicate value is always "1" as well.
The java.util.Collections class is awesome. And, the shuffle() method works wonders on normal ColdFusion arrays (which are Java collections behind the scenes). But, it looks like something goes consistently wrong when trying to use a ColdFusion query column as a Java collection. What a strange bug!
Want to use code from this post? Check out the license.
Randomization is a sort, of sorts. And database vendors spend millions of dollars every year on optimizing sorts. So why not let the database do it?
Make one of the columns of a select a randomization formula, preferably using a built-in function seeded by time of day, and sort on that column.
I do that for a contractor search engine I created for the Federal Government, so that the search doesn't show favoritism to contractors who got into the database early-on.
As long as the random seed is preserved, you can even page your results. For example, by basing all randomizations on the original start date/time (the one where the first page of results got displayed), you don't get the same row showing up on two different pages (which implies other rows being missed).
If I had to randomly sort a query with data that was accessible from the database, then I'd probably use the database as much as possible (as you are saying). But, I just wanted to see how or IF this would work. I knew you could use ColdFusion array functions on query columns; but, I didn't know if this would hold true when interacting with the Java layer explicitly.
That said, I can think of reasons why you might need to alter a query after it has come back from the database. It would be cool if this was an available option.
I didn't link to any of these in the "related" blog posts, but since Steve brought up random query stuff, I figured I throw these out there:
To this day, I don't truly understand how binary data values get cast in SQL -- that's beyond my mental model.