Skip to main content
Ben Nadel at NCDevCon 2011 (Raleigh, NC) with: Shannon Cross
Ben Nadel at NCDevCon 2011 (Raleigh, NC) with: Shannon Cross

Creating A ColdFusion-Oriented HashCode With Loose Types

By
Published in Comments (13)

In the companion app for my feature flags book, each user's persisted configuration file has a version number. I want this version number to be incremented when the user updates their feature flags configuration; but, only if something within the configuration has actually changed. To this end, I need a way in ColdFusion to take two data structures and check to see if they are deep-equals. For that, I created the concept of a "FusionCode": a consistent, repeatable value that represents a complex data structure.

In Java, this is what the various "hash code" methods are for. Each Java object has a o.hashCode() method that returns an int. And then, there are various utility methods that compose this value for complex objects (ex, Objects.hashCode(o) and Arrays.hashCode(o)).

But, the ColdFusion loose type system doesn't play well with Java's hashCode() calculations. Or rather, I should say that it does play well in 95% of cases; and then, breaks in subtle ways in 5% of cases.

For example, the following values have two different HashCodes in Java:

  • javaCast( "int", 3 )
  • javaCast( "long", 3 )

As they should (I guess)—they are two different data-types. But, in ColdFusion, there's no semantic difference between these two values. So, from the ColdFusion perspective, these two values should have the same FusionCode.

Of course, it's not that often that we need to explicitly create different Java types in our ColdFusion code. But, sometimes, this happens implicitly in very subtle ways. Consider the following ColdFusion structure that goes through the JSON serialization workflow:

<cfscript>

	data = { value: 12.0 };
	dataPrime = deserializeJson( serializeJson( data ) );

</cfscript>

In this code, data and dataPrime are structurally equivalent. But, they have different .hashCode() values. And, passing them to Objects.deepEquals(data, dataPrime) results in false. This is true for both Adobe ColdFusion and Lucee CFML.

However, if you convert that top-level reference from a Struct to an Array:

<cfscript>

	data = [ 12.0 ];
	dataPrime = deserializeJson( serializeJson( data ) );

</cfscript>

Then Objects.deepEquals(data, dataPrime) returns true in Lucee CFML and returns false in Adobe ColdFusion. Why? I have no idea.

The problem is, the whole HashCode calculation is a bit of black box built on top of a very strict type system. So, when it comes to ColdFusion, I needed to make something that was more clear in its intent; and, which played a little nicer with ColdFusion's loose type system.

I created a ColdFusion component, FusionCode.cfc which provides two methods:

  • getFusionCode( value )
  • deepEquals( valueA, valueB )

The former method generates a CRC-32 checksum of the given value by recursively walking over the given data structure and normalizing the values—using ColdFusion semantics—as it traverses. The latter method just calls getFusionCode() for each argument and then compares the two results.

Here's a demo in which the fusionCode.deepEquals() is true but the objects.deepEquals() is false—in both Lucee CFML and Adobe ColdFusion:

<cfscript>

	data = [
		version: 12.0, // <--- Messes up the .hashCode() approach in Lucee and ACF.
		users: [
			[ id: 1, name: "Jo" ],
			[ id: 2, name: "Kit" ],
			[ id: 3, name: "Sam" ]
		],
		legacyEnabled: false,
		legacyCode: javaCast( "null", "" ) // <--- Messes up the .hashCode() ACF.
	];

	// Run data through the JSON serialization.
	dataPrime = deserializeJson( serializeJson( data ) );

	writeDump( data );
	writeDump( dataPrime );

	// -- Fusion Code Test -- //

	writeDump({
		"FusionCode deepEquals":
		new FusionCode().deepEquals( data, dataPrime )
	});

	// -- Hash Code Test -- //

	writeDump({
		"Objects deepEquals":
		createObject( "java", "java.util.Objects" ).deepEquals( data, dataPrime )
	});

</cfscript>

Again, we're taking a complex structure, data, and running it through the JSON serialization workflow to produce, dataPrime. Then we're comparing these two values using the FusionCode and HashCode concepts. And when we run this in Lucee CFML and Adobe ColdFusion, we get the following (both results are in the same screenshot):

Output showing that the two data structures are different according to Java's Object.deepEquals() but the same according to ColdFusion's fusionCode.deepEquals().

As you can see, Objects.deepEquals() sees data and dataPrime as different structures in both Adobe ColdFusion and Lucee CFML. But, my fusionCode.deepEquals() sees them as the same.

Here's my FusionCode.cfc implementation. At the root of the internal algorithm is a visitValue() method. This method inspects the given argument, uses ColdFusion's decision functions to determine which data type it is, and then defers to another visit* function that is geared towards said data type. As it performs these visitations recursively, it passed-through a CRC-32 checksum instance to which it is adding normalized values.

component
	output = false
	hint = "I provide methods for generating a consistent, repeatable token for a given ColdFusion data structure (akin to Java's hashCode, but with ColdFusion looseness)."
	{

	/**
	* I initialize the component.
	*/
	public void function init() {

		variables.BigDecimal = createObject( "java", "java.math.BigDecimal" );

	}

	// ---
	// PUBLIC METHODS.
	// ---

	/**
	* I determine if the two values are equal based on their generated FusionCodes.
	*/
	public boolean function deepEquals(
		any valueA,
		any valueB
		) {

		return ( getFusionCode( arguments?.valueA ) == getFusionCode( arguments?.valueB ) );

	}


	/**
	* I calculate the FusionCode for the given value.
	*
	* The FusionCode algorithm creates a CRC-32 checksum and then traverses the given data
	* structure and adds each visited value to the checksum calculation. Since ColdFusion
	* is loosely typed / dynamically typed language, the FusionCode algorithm performs
	* some ColdFusion-oriented type casting to allow slightly different value types to be
	* considered the "same" value (in the same way that a ColdFusion equality check will).
	* For example, "int" and "long" values are both recorded as "long". And, the string
	* "3" and the number 3 are both recorded as longs. This is where the FusionCode and
	* Java's HashCode algorithm significantly diverge.
	*/
	public numeric function getFusionCode( any value ) {

		var checksum = createObject( "java", "java.util.zip.CRC32" ).init();

		visitValue( checksum, arguments?.value );

		return checksum.getValue();

	}

	// ---
	// PRIVATE METHODS.
	// ---

	/**
	* I add the given Boolean value to the checksum.
	*/
	private void function putBoolean(
		required any checksum,
		required boolean value
		) {

		putString( checksum, ( value ? "[______TRUE______]" : "[______FALSE______]" ) );

	}


	/**
	* I add the given date value to the checksum.
	*/
	private void function putDate(
		required any checksum,
		required date value
		) {

		putString( checksum, dateTimeFormat( value, "iso" ) );

	}


	/**
	* I add the given number value to the checksum.
	*/
	private void function putNumber(
		required any checksum,
		required numeric value
		) {

		putString(
			checksum,
			BigDecimal
				.valueOf( javaCast( "double", value ) )
				.toString()
		);

	}


	/**
	* I add the given string value to the checksum.
	*/
	private void function putString(
		required any checksum,
		required string value
		) {

		checksum.update( charsetDecode( value, "utf-8" ) );

	}


	/**
	* I visit the given array value, recursively visiting each element.
	*/
	private void function visitArray(
		required any checksum,
		required array value
		) {

		var length = arrayLen( value );

		for ( var i = 1 ; i <= length ; i++ ) {

			putNumber( checksum, i );

			if ( arrayIsDefined( value, i ) ) {

				visitValue( checksum, value[ i ] );

			} else {

				visitValue( checksum /* , NULL */ );

			}

		}

	}


	/**
	* I visit the given binary value.
	*/
	private void function visitBinary(
		required any checksum,
		required binary value
		) {

		checksum.update( value );

	}


	/**
	* I visit the given Java value.
	*/
	private void function visitJava(
		required any checksum,
		required any value
		) {

		putNumber( checksum, value.hashCode() );

	}


	/**
	* I visit the given null value.
	*/
	private void function visitNull( required any checksum ) {

		putString( checksum, "[______NULL______]" );

	}


	/**
	* I visit the given query value, recursively visiting each row.
	*/
	private void function visitQuery(
		required any checksum,
		required query value
		) {

		var columnNames = ucase( value.columnList )
			.listToArray()
			.sort( "textnocase" )
			.toList( "," )
		;

		putString( checksum, columnNames );

		for ( var i = 1 ; i <= value.recordCount ; i++ ) {

			putNumber( checksum, i );
			visitStruct( checksum, queryGetRow( value, i ) );

		}

	}


	/**
	* I visit the given simple value.
	*/
	private void function visitSimpleValue(
		required any checksum,
		required any value
		) {

		if ( isNumeric( value ) ) {

			putNumber( checksum, value );

		} else if ( isDate( value ) ) {

			putDate( checksum, value );

		} else if ( isBoolean( value ) ) {

			putBoolean( checksum, value );

		} else {

			putString( checksum, value );

		}

	}


	/**
	* I visit the given struct value, recursively visiting each entry.
	*/
	private void function visitStruct(
		required any checksum,
		required struct value
		) {

		var keys = structKeyArray( value )
			.sort( "textnocase" )
		;

		for ( var key in keys ) {

			putString( checksum, ucase( key ) );

			if ( structKeyExists( value, key ) ) {

				visitValue( checksum, value[ key ] );

			} else {

				visitValue( checksum /* , NULL */ );

			}

		}

	}


	/**
	* I visit the given xml value.
	*/
	private void function visitXml(
		required any checksum,
		required xml value
		) {

		putString( checksum, toString( value ) );

	}


	/**
	* I visit the given generic value, routing to a more specific visit method.
	*
	* Note: This method doesn't check for things that wouldn't otherwise be in data
	* structure. For example, I'm not checking for things like Closures or CFC instances.
	*/
	private void function visitValue(
		required any checksum,
		any value
		) {

		if ( isNull( value ) ) {

			visitNull( checksum );

		} else if ( isArray( value ) ) {

			visitArray( checksum, value );

		} else if ( isStruct( value ) ) {

			visitStruct( checksum, value );

		} else if ( isQuery( value ) ) {

			visitQuery( checksum, value );

		} else if ( isXmlDoc( value ) ) {

			visitXml( checksum, value );

		} else if ( isBinary( value ) ) {

			visitBinary( checksum, value );

		} else if ( isSimpleValue( value ) ) {

			visitSimpleValue( checksum, value );

		} else {

			visitJava( checksum, value );

		}

	}

}

This FusionCode.cfc implementation bakes in some assumptions that may or may not be good. For example, it calls ucase() on Struct keys and Query column names. But, it doesn't call ucase() on other string values despite the fact that "hello" and "HELLO" are equivalent in ColdFusion. I think one improvement would be to turn some of these assumptions into settings that can be turned on and off.

For now, though, this should unblock some of my work in the feature flags book companion app. It should be sufficient for determining whether or not a given sub-structure has changed. And, the looseness of the type checking should work well with my ConfigValidation.cfc component, which will type-cast inputs to the necessary type during the request processing.

Update 2024-07-05

I didn't realize this, but my use of javaCast("long") was truncating decimal values. I get a little fuzzy with the lower-level numeric data types in Java since all of ColdFusion's numbers are just "numeric" (with some edge-cases in which you run into int overflow errors). I've updated my putNumeric() to use BigDecimal.valueOf(double) instead of BigInteger.valueOf(long).

The BigDecimal documentations states that numbers of different "scale" will have different hashCode values. But, I think my use of javaCast("double") is normalizing the scale. I think.

Update 2024-07-06

I misunderstood what the checksum.update(int) was doing. I thought it was consuming the entire integer in the checksum mutation; but, it seems that it was only taking the lowest byte (8-bits):

update(int b): Updates the CRC-32 checksum with the specified byte (the low eight bits of the argument b).

As such, two different integers could collide and create a false equivalence if they had the same lowest byte.

I've updated the putNumber() method to turn around and call the putString() method using the canonical string produced by BigDecimal:

component {

	private void function putNumber(
		required any checksum,
		required numeric value
		) {

		putString(
			checksum,
			BigDecimal
				.valueOf( javaCast( "double", value ) )
				.toString()
		);

	}

}

This combination of the javaCast() to a double, and then piping it through the .toString() on BigDecimal seems to give a good result for numbers that are the same, but have different "scales" (ex, 12 and 12.0). According to the JavaDocs:

The toString() method provides a canonical representation of a BigDecimal.

Update 2024-07-07

I authored a follow-up blog post in which the behavior of the FusionCode.cfc can be configured. Specifically, there are two settings which can be set in the init() method; or, passed-in with each .getFusionCode() call:

  • caseSensitiveKeys - I determine if struct keys and column names are canonicalized using ucase(). When enabled, key and KEY will be considered different.

  • typeCoercion - I determine if strict decision functions should be used when inspecting a given value. When disabled, false and "no" will be considered different. As will 1 and "1".

After I was done with the current blog post, I realized that I actually needed key-case-sensitivity in my own work. As such, I went back and added some more robust behavior.

Want to use code from this post? Check out the license.

Reader Comments

38 Comments

Does it only work with simple values? Or can you stuff it with objects (like Java objects, CFC's, etc.) and it still do the work?

How's the speed?

15,798 Comments

@Will,

I really only designed it to work with native data structures (string, struct, array, etc). But, if all else fails, it will fall back to using the .hashCode() on whatever you give it. That said, I have no idea how that would work with CFCs.

As for speed, I'm assuming it's not great due to recursion. But, it's probably not terrible since most data structures aren't that deep. That said, I only need it for when I'm mutating data; so the use-case would be limited in scope. Meaning, it's not something that I'd be running on every request.

13 Comments

Have you tried objectEquals() for this? Ignore the goofy explanation in cfdocs, but I think it does the same for your code (at least in Lucee... I have not used ACF for eons)

15,798 Comments

@Andrew,

Literally never seen that function before 🤪 I'll have to play around with it; but, it might be exactly on the money. That said, Adobe ColdFusion also has one, but the description seems to be about "client side CFCs". Very confusing. Awesome tip, though! I'll circle back on what I find.

For others, here's two relevant links:

15,798 Comments

@Andrew,

So, I just added this to my test code (I didn't update the blog post, just did this on my dev server):

writeDump({
	"objectEquals":
	objectEquals( data, dataPrime )
});

And it reports back as NO on Adobe ColdFusion and true on Lucee CFML. I don't know which parts of this are necessary not working as one would hope; but, it seems to have something that is different that my choices (in FusionCode.cfc) and something that is different in between engines.

15,798 Comments

I just realized that javaCast( "long" ) is truncated decimal values. I didn't realize that—I had thought that long could hold decimal values. I get a little fuzzy on the low-level data types. It looks like double can hold decimals, though. I'll find a way to tweak that.

15,798 Comments

I've updated the code to use BigDecimal + javaCast("double") instead of BigInteger + javaCast("long") for normalizing numbers. Hopefully this is more accurate.

15,798 Comments

Hmmm, and now I'm wondering if I can't just normalize numbers with the javaCast() alone, and not worry at all about the BigDecimal:

javaCast( "double", value ).hashCode()

Gonna play with that and see if that works better.

15,798 Comments

Ahh, ok, I can't do that. I didn't realize this at first, but the update(int) in the CRC-32 is only taking the lowest 8-bits:

update(int b): Updates the CRC-32 checksum with the specified byte (the low eight bits of the argument b).

Uggg, this gets more complicated. Ok, I think maybe I have to update my putNumber() method to actually stringify the value using BigDecimal. Something like:

private void function putNumber(
	required any checksum,
	required numeric value
	) {

	putString(
		checksum,
		BigDecimal
			.valueOf( javaCast( "double", value ) )
			.toString()
	);

}

The combination of the javaCast() and the BigDecimal.toString() seems to give the best results.

15,798 Comments

Oh man, this is such a rabbit hole! Fraught with edge-cases. What I'm realizing now is that by converting the BigDecimal to a string, I can get false-positive equivalence. Maybe not so much in this particular post (where type-coercion is acceptable); but, in a follow-up post that I'm working on, I run into a case where the string "100.1" and the BigDecimal.valueOf(100.1) then get stringified in the same way. Yargggg!!!

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel