ColdFusion 10 - Hashing Binary Data And Byte Arrays
For years, ColdFusion has had the hash() function for taking variable-length string data and creating one-way "fingerprints" of the original value. This function has changed over time to include algorithm and encoding options; but, it has always worked with string data. Now, with ColdFusion 10, the hash() function has been enhanced to accept binary data (aka. byte arrays). This means that we can now create one-way "fingerprints" of binary values.
NOTE: At the time of this writing, ColdFusion 10 was in public beta.
To demonstrate, let's read in an image file in binary format and output the hash of the image's binary data:
<cfscript> // Read in the raw Binary data of the image. imageBinary = fileReadBinary( expandPath( "./gina_carano.jpg" ) ); // Get the hash of the byte array (that IS the image). imageHash = hash( imageBinary ); // Output the image "fingerprint". writeOutput( "Fingerprint: " & hash( imageBinary ) ); </cfscript>
As you can see, the first argument of the hash() function can now accept a byte array. When we run the above code, we get the following output:
As you can see, the default MD5 algorithm has taken our byte array (binary data) and returned our standard 32-character Hexadecimal string.
Out of curiosity, I wanted to see how the hashing of a Binary value would relate to the hashing of its String representation. As such, I tried using the hash() function to hash both a TXT file and the string content contained within that TXT file:
<cfscript> // Create our string message. message = "It's Friday, Friday - you gotta get down on Friday!"; // Write message to file. fileWrite( expandPath( "./message.txt" ), message ); // Read the message in as binary. messageBinary = fileReadBinary( expandPath( "./message.txt" ) ); // Output the string "fingerprint". writeOutput( "STR Fingerprint: " & hash( message ) & "<br />" ); // Output the binary "fingerprint". writeOutput( "BIN Fingerprint: " & hash( messageBinary ) ); </cfscript>
This time, when we run the above code, we get the following output:
STR Fingerprint: 60408C08C4AB05073FCEC10FAAE3915E
BIN Fingerprint: 60408C08C4AB05073FCEC10FAAE3915E
Here, you can see that the hash() of the string content was the same as the hash of the file itself. I don't really have any conclusions to draw from this last experiment - it was just an interesting thing to see. And, since each string character is defined by a single byte, this equivalence relationship probably makes sense.
As a final note, I should also point out that the hash() function now takes an argument that defines the number of hashing iterations to apply to the target value. The number-of-iterations enhancement is a security feature and is beyond what I would be able to explain in a meaningful way. I'll defer to the security experts to elucidate that one.
Want to use code from this post? Check out the license.
Nice! You could check if a image has been modified by comparing the fingerprint of two identical images. In some scenarios it can be useful, like document storage and cms systems and like.
I believe you could do that. Though, in my testing, I did find out that the binary data was different from the value returned from imageGetBlob() on a ColdFusion image of the same data. So, you just have to be careful that you're always accessing the raw binary data when doing the comparison.
After this post, I got to thinking about fun ways that we could use images. As such, I wanted to do a quick exploration of hashing byte arrays before ColdFusion 10:
Here, I'm using the underlying Java layer to dip down into the some more manual hashing approaches.
I think all CF10 is doing is running the equivalent of toString() on binary data. If I change:
writeOutput( "BIN Fingerprint: " & hash( messageBinary ) );
writeOutput( "BIN Fingerprint: " & hash( toString(messageBinary) ) );
It works fine in pre-CF10 (I get the same md5 sig as you show.)
So it's certainly convenient that it auto converts the values for you, but it should be easily enough to recreate in pre-CF10.
When the underlying data is "String" data, I think you are exactly right. But this morning, I was playing around with hashing image data and the toString() method was not working - at least, I don't think it was working. For image data, I had to resort to dipping down into Java.
I was able to get it working fine w/an image using fileReadBinary(). I tested it before posting my comment.
Hmmm, very strange! I am not sure why I am getting a different value (than CF10 hash). I wonder if it has to do with the type of image I am loading.