Manually Serializing A String Using JSON-Encoding In ColdFusion

By Ben Nadel

Published 2015-06-02 in ColdFusion

After coming across a rather unfortunate bug in ColdFusion's JSON (JavaScript Object Notation) serializer - serializeJson() - I was curious as to what it would look like to manually serialize a string value in ColdFusion. The JSON specification is rather small; there's really only a few characters that require special handling. So, I figured I would see if I could create a function that would handle it properly.

To test my serialization method, I can bring a string through the serialization lifecycle and make sure that the input matches the subsequent deserialized value. And, since I'm only worrying about the serialization component, I figure I can use ColdFusion's native JSON deserializer - deserializeJson() - to test the result of my manual serialization.

In order to make sure that I test all the edge-cases, I'm going to programmatically build the input, rather than trying to type something in manually. In this case, I'm going to create an input that consists of the first 10,000 ASCII values:

<cfscript>

	/**
	* I take a string and serialize it using JSON (JavaScript Object Notation) encoding.
	*
	* @input I am the string being encoded.
	* @output false
	*/
	public string function serializeString( required string input ) {

		// While this may not be technically needed, this will ensure that we are not
		// using any "undocumented features" of the language. If we explicitly cast to
		// a Java string, and something goes wrong due to odd type-casting, it's a
		// ColdFusion bug, at that point, not a logic error ;)
		input = javaCast( "string", input );

		var length = len( input );

		// Rather than using a string buffer or an array, we'll just capture the output
		// of the code using the underlying context writer. I think this makes the code
		// a bit easier to read and requires one less reference to be passed-around.
		savecontent variable = "local.json" {

			writeOutput( """" );

			for ( var i = 1 ; i <= length ; i++ ) {

				var charCode = input.codePointAt( javaCast( "int", i - 1 ) );

				// Check for the most common case first (normal characters).
				if (
					( charCode >= 32 ) &&
					( charCode != 34 ) &&
					( charCode != 47 ) &&
					( charCode != 92 ) &&
					( charCode != 8232 ) &&
					( charCode != 8233 )
					) {

					writeOutput( chr( charCode ) );

				// Check for the special cases next (control characters, characters that
				// need to be escaped, and characters that need to be encoded nicely).
				} else if ( charCode == 8 ) {

					writeOutput( "\b" );

				} else if ( charCode == 9 ) {

					writeOutput( "\t" );

				} else if ( charCode == 10 ) {

					writeOutput( "\n" );

				} else if ( charCode == 12 ) {

					writeOutput( "\f" );

				} else if ( charCode == 13 ) {

					writeOutput( "\r" );

				} else if (
					( charCode < 32 ) ||
					( charCode == 8232 ) ||
					( charCode == 8233 )
					) {

					// For Unicode hex values, we need to enforce a 4-digit code.
					writeOutput( "\u" & right( ( "000" & formatBaseN( charCode, 16 ) ), 4 ) );

				} else if ( charCode == 34 ) {

					writeOutput( "\""" );

				} else if ( charCode == 47 ) {

					writeOutput( "\/" );

				} else if ( charCode == 92 ) {

					writeOutput( "\\" );

				}

			} // END: SaveContent.

			writeOutput( """" );

		}

		return( json );

	}


	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //


	// To test the serialization, we're going to create a string that contains the
	// first 10,000 ASCII values. The, we're going to serialize it using our method and
	// deserialize it using the native ColdFusion methods. If the input and outputs
	// match, I think we can assume the manual approach was technically correct.
	input = "";

	for ( i = 0 ; i <= 9999 ; i++ ) {

		input &= chr( i );

	}

	// Serialize using our manual approach.
	serialized = serializeString( input );

	// Deserialize using the native ColdFusion methods.
	deserialized = deserializeJson( serialized );

	// Check for round-trip accuracy using a case-sensitive comparator.
	writeOutput( "Input and output match: " & yesNoFormat( ! compare( input, deserialized ) ) );

	// For further debugging:
	// --
	// writeOutput( "<br />" );
	// writeOutput( input & "<br />" );
	// writeOutput( serialized & "<br />" );
	// writeOutput( deserialized & "<br />" );
	// writeOutput( serializeJson( input ) & "<br />" );

</cfscript>

As you can see, I'm looking at each individual char-code in the input and then writing the appropriate representation to the output buffer. In addition to the sub-32 control characters, I'm also checking for:

8232 - Line Separator
8233 - Paragraph Separator

... as I have found that these will also break JSON-parsing in a JavaScript context.

When we run the above code, we get the following page output:

Input and output match: Yes

Looks like it worked out quite nicely. Due to the complications inflicted by the ColdFusion JSON serialization bug, I think this is the kind of approach that I'll have to add to my JSONSerializer.cfc ColdFusion library (which currently uses serializeJson() under the hood for String values).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/2843

Reader Comments

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.