Updated Thoughts On Validating Data In My Service Layer In ColdFusion

By Ben Nadel

Published 2022-08-15 in ColdFusion — Comments (5)

When I was building my proof-of-concept (POC) for feature flags in ColdFusion, I started to work with complex data structures that were far more complex than the flat, relational data that I'm used to working with. As such, I didn't have an good instinct about how to go about validating, sanitizing, and normalizing this data. In an earlier post, I looked at validating complex, nested data structures in ColdFusion; but, validation turns out to only be part of the story - especially in a dynamically-typed, case-insensitive runtime. Now that my POC is published, I wanted to circle back and share my updated thoughts on handling data in my ColdFusion service layer.

Part of the reason that I haven't developed a strong instinct for this part of my ColdFusion application's control flow is because I've always had a relational database storing my data behind the scenes. A relational database enforces a strict schema, which means I can get a little loosey-goosey in my data handling while still avoiding data corruption issues. In essence, I've been able to lean on my relational database management system (ex, MySQL) to:

Enforce data type coercion.
Enforce proper key-casing.
Enforce value lengths.

In my feature flag exploration, however, I was storing all of my demo data in a serialized JSON (JavaScript Object Notation) data file. With no database, there was nothing to "lean on" for schema enforcement. Which meant that all of the data validation, transformation, and sanitization had to happen in the "business logic" of my ColdFusion application.

After I created my first pass at a Validation component to validate that my complex data types had the correct shape, I realized that just validating the data wasn't enough. After all, in a dynamically-typed, case-insensitive runtime like ColdFusion, the following two structures could easily pass-through the same validation logic:

{ "VALUE": "1.5" }
{ "value": 1.5 }

And, even more disturbing, if I were going to persist a user-provided Struct, there was a non-zero chance that the user-provided struct would contain garbage and/or malicious data. Meaning, the above struct could just as easily arrive at my ColdFusion service layer looking like this:

{ "value": 1.5, "filePath": "../../../../etc/passw.ini" }

So, not only did I have to validate the data, I also had to sanitize it and transform it such that:

Structs only contained the expected keys.
Struct keys were in the proper key-casing.
Struct keys were defined in the proper order (though this is strictly an aesthetic issue, not a functional one).
All values were coerced to the appropriate, native data-type.
"Magic strings" (such as a varchar column for type or status) were in the proper key-casing.

After hemming-and-hawing about ways to approach this problem, I thought to myself: why not just have my existing validation component do all of this? What this means is that instead of just "testing" a value, my validation component would test, normalize, and transform the input, preparing it for data persistence. So, instead of making a call like this:

validation.testEmail( email )

... I'd be making a call like this:

email = validation.testEmail( email )

... where the email was both being tested and returned. Or rather, a validated, sanitized, transformed value of it would be returned.

Now, this is an approach that Robert C. Martin would generally rail-against in Clean Code. It's essentially having a method do "more than one thing"; and, is likely a violation of the command query segregation principle. But, if the entire reason-for-being of the Validation component is do this sort of work, I don't think it's a problem. This doesn't create unexpected behavior because the behavior will be consistent across all methods in this type of ColdFusion component.

So, what might this validation.testEmail() look like? Here's an example:

component {

	/**
	* I test the given email, returning only valid values or throwing an error.
	*/
	public string function testEmail( required string email ) {

		email = canonicalizeInput( email.trim().lcase() );

		if ( ! email.len() ) {

			throw(
				type = "User.Email.Empty",
				message = "User email is empty"
			);

		}

		if ( email.len() > 75 ) {

			throw(
				type = "User.Email.TooLong",
				message = "User email is too long",
				extendedInfo = serializeJson({
					value: email,
					maxLength: 75
				})
			);

		}

		if ( ! isEmailPattern( email ) ) {

			throw(
				type = "User.Email.Invalid",
				message = "User email does not look like a valid email.",
				extendedInfo = serializeJson({
					value: email
				})
			);

		}

		return( email );

	}

}

First, notice that the method is returning a value - the validated and normalized email address. Then, notice that this method is also calling:

.trim() - transformation: making sure there's is no leading / trailing whitespace.
.lcase() - transformation: making sure all email addresses are stored in lower-case.
canonicalizeInput() - validation: making sure the email doesn't contain any encoded data (implementation not shown in this snippet).
.len() - validation: making sure the email length falls within storage boundaries.
isEmailPattern() - validation: making sure the email looks like a valid email format (implementation not shown in this snippet).

By extracting this low-level validation and transformation logic out into this ColdFusion component, it ends up making my service component much easier to follow. Here's an example component that creates a new user - note that I am injecting my validation object using the synthesized accessors:

component
	accessors = true
	output = false
	hint = "I provide service methods for users."
	{

	// Define properties for dependency-injection.
	property gateway;
	property validation;

	// ---
	// PUBLIC METHODS.
	// ---

	/**
	* I create a new user and return the generated ID.
	*/
	public numeric function createUser(
		required string email,
		required string password,
		required struct source
		) {

		email = validation.testEmail( email );
		password = validation.testPassword( password );
		source = validation.testSource( source );

		if ( isEmailTaken( email ) ) {

			validation.throwAlreadyExistsError( email );

		}

		var id = gateway.createUser(
			email = email,
			password = password,
			source = source,
			createdAt = now(),
			updatedAt = now()
		);

		return( id );

	}


	/**
	* I determine if the given email address is already in use by another user.
	*/
	public boolean function isEmailTaken( required string email ) {

		var result = gateway.getUsersByFilter( email = email );

		return( !! result.recordCount );

	}

}

As you can see, all the tedium of the low-level validation and transformation has been handed off to the validation object, leaving our service code relatively simple.

Now, you may notice that the UserService.cfc ColdFusion component is also doing some validation around the global-uniqueness of the email address. That's because the Validation component doesn't deal with the interconnections between users - it only deals with the low-level data itself. All higher-level validation remains firmly within the "business logic" (whether that's in the "Service" layer, seen above, or the "Workflow" / "Use-Cases" layer).

In this demo, email address is just a simple string; but, I'm also passing in a "source" object. Let's pretend that this is a structure that contains metadata about where the user signed-up for an account. In reality, this probably wouldn't be part of the "user" data; but, I needed something more complex to demo. As such, let's assume the source has the following keys:

siteID - a string.
trackingID - a string.

When our validation component tests this struct, it's going to create a deep clone of the struct that explicitly plucks out the keys:

component {

	/**
	* I test the given sign-up source, returning only valid values or throwing an error.
	*/
	public struct function testSource( required struct rawSource ) {

		try {

			param name="rawSource.siteID" type="string";
			param name="rawSource.trackingID" type="string";

			// Since we're going to persist this complex structure, we want to make sure
			// that it only contains the expected keys; and, that the keys are in the
			// proper key-casing; and that the values are the correct data-type (ie, not
			// simply coerced as part of the type-check). To do this, we want to extract
			// the data into a cloned structure.
			return([
				siteID: canonicalizeInput( rawSource.siteID.trim() ),
				trackingID: canonicalizeInput( rawSource.trackingID.trim() )
			]);

		} catch ( any error ) {

			throw(
				type = "User.Source.Invalid",
				message = "User source has an invalid structure."
			);

		}

	}

}

Notice that this method is:

Returning a brand new struct.
Returning an ordered struct so that the keys are always serialized / deserialized in the same order.
Ensuring that the necessary keys exist.
Explicitly plucking the keys from the source value, ensuring proper key-casing.
Explicitly casting the values to a String (using the canonicalizeInput() method).
Trimming all values for storage.

Now, any extraneous and/or malicious garbage that a user might be adding to the input is ignored. And, all data provided is prepared for storage using the correct key-casing and data-type-casting.

Since this is a relatively fresh view on data validation (for me), I'm still considering it a work-in-progress. But, I am finding it quite nice. I love that it allows me to get into the nitty-gritty of data validation while still keeping my "calling call" very simple and easy to read. I'm going to be using this in my upcoming ColdFusion work; and, I'll be sure to report back any issues.

For Completeness: `UserValidation.cfc`

While I have snippets above for the validation service, here's the full code for completeness:

component
	output = false
	hint = "I provide validation, normalization, and error-generation methods for users."
	{

	/**
	* I test the given email, returning only valid values or throwing an error.
	*/
	public string function testEmail( required string email ) {

		email = canonicalizeInput( email.trim().lcase() );

		if ( ! email.len() ) {

			throw(
				type = "User.Email.Empty",
				message = "User email is empty"
			);

		}

		if ( email.len() > 75 ) {

			throw(
				type = "User.Email.TooLong",
				message = "User email is too long",
				extendedInfo = serializeJson({
					value: email,
					maxLength: 75
				})
			);

		}

		if ( ! isEmailPattern( email ) ) {

			throw(
				type = "User.Email.Invalid",
				message = "User email does not look like a valid email.",
				extendedInfo = serializeJson({
					value: email
				})
			);

		}

		return( email );

	}


	/**
	* I test the given password, returning only valid values or throwing an error.
	*/
	public string function testPassword( required string password ) {

		// While there's nothing TECHNICALLY wrong with having leading and/or trailing
		// whitespace characters in a password, there's non-zero chance that this was done
		// as part of a copy/paste error. As such, let's sit on the side of safety and
		// block whitespace at the edges.
		if ( password != password.trim() ) {

			throw(
				type = "User.Password.WhiteSpace",
				message = "User password contains leading or trailing whitespace"
			);

		}

		// NIST (National Institute of Standards and Technology) currently recommends
		// a minimum password length of 8 (focusing on length, NOT complexity).
		// --
		// https://pages.nist.gov/800-63-3/sp800-63b.html
		if ( password.len() < 8 ) {

			throw(
				type = "User.Password.TooShort",
				message = "User password is too short."
			);

		}

		// BCrypt input limit (there are ways around this that add a lot of complexity,
		// but for the sake of the demo, let's use this as a validation step).
		if ( password.len() > 72 ) {

			throw(
				type = "User.Password.TooLong",
				message = "User password is too long."
			);

		}

		return( password );

	}


	/**
	* I test the given sign-up source, returning only valid values or throwing an error.
	*/
	public struct function testSource( required struct rawSource ) {

		try {

			param name="rawSource.siteID" type="string";
			param name="rawSource.trackingID" type="string";

			// Since we're going to persist this complex structure, we want to make sure
			// that it only contains the expected keys; and, that the keys are in the
			// proper key-casing; and that the values are the correct data-type (ie, not
			// simply coerced as part of the type-check). To do this, we want to extract
			// the data into a cloned structure.
			return([
				siteID: canonicalizeInput( rawSource.siteID.trim() ),
				trackingID: canonicalizeInput( rawSource.trackingID.trim() )
			]);

		} catch ( any error ) {

			throw(
				type = "User.Source.Invalid",
				message = "User source has an invalid structure."
			);

		}

	}


	/**
	* I thrown an already-exists error for the given email.
	*/
	public void function throwAlreadyExistsError( required string email ) {

		throw(
			type = "User.AlreadyExists",
			message = "User with the given email already exists.",
			extendedInfo = serializeJson({
				value: email
			})
		);

	}

	// ---
	// PRIVATE METHODS.
	// ---

	/**
	* I canonicalize the given input and throw an error if the canonicalization changed
	* the value, which would indicate that the given input contained encoded data.
	*/
	private string function canonicalizeInput( required string input ) {

		var normalizedInput = ( canonicalize( input, true, true ) ?: "" );

		// If the canonicalized input does NOT MATCH the raw input, it means that the
		// raw input contained encoded values. This is totes suspicious.
		if ( input != normalizedInput ) {

			throw(
				type = "User.MaliciousInput",
				message = "User data contains potentially malicious encodings.",
				extendedInfo = serializeJson({
					value: input
				})
			);

		}

		return( normalizedInput );

	}


	/**
	* I determine if the given email looks "enough" like a valid email.
	* 
	* TODO: Move to a "base validation service" so that this can be used to validate
	* different types of entities.
	*/
	private boolean function isEmailPattern( required string email ) {

		// Trying to exhaustively validate an email address is a fool's errand. Let's just
		// make sure the email address looks mostly like what an email should look like.
		// It needs to have one "@" sign, a user, and a dot-delimited domain. 'Nuff said.
		var emailLikePattern = "(?x)
			## Anchor to start of string.
			^

			## Email 'user' is a non-empty match of everything before the @.
			[^@]+

			## Literal separator.
			@

			## Email 'domain' must be a dot-delimited list of (length > 1).
			[^.@]+(\.[^.@]+)+

			## Anchor to end of string.
			$
		";

		return( !! email.reFind( emailLikePattern ) );

	}

}

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4308

Reader Comments

Ben Nadel Aug 15, 2022 at 11:42 AM

16,161 Comments

If you want to see a more real-world example of this "Validation component", check out the FeatureFlagValidation.cfc in my Stangler repository. It handles all the validation, transformation, and normalization for the aforementioned feature flag data before it gets persisted to a JSON file.

Bilal Aug 15, 2022 at 7:10 PM

7 Comments

Ben,
I like the general thoughts you put into this. In the JS world the validation library for complex object inline validation that I use is JOI. It allows a chaining of validation conditions and validation of complex structures at each node with many options. Maybe something to review for ideas on how to expand the CF based work ;o)

Cheers,
Bilal

Ben Nadel Aug 15, 2022 at 7:15 PM

16,161 Comments

@Bilal,

I used Joi for a bit a few years ago when I was on a team that was building Node.js services. It was an interesting approach. I don't love configuration-based approaches to validation; but, I think that mostly stems from me not having much hands-on experience. Sometimes, I just prefer to brute force things.

That said, I vaguely remember running into an issue where Joi was overly strict in so much as it would reject requests that had extra properties. This used to cause problems for our teams because some clients would send across additional keys due to the way the API client was configured; and we would end up rejecting those requests needlessly. I remember having to jump through hoops to add HTTP interceptors (on the client-side) to strip out the keys that were causing a problem.

Though, I guess it's all about perspective - perhaps rejecting requests with extra data is a "feature" for some and a "bug" for others.

Over on Twitter, someone Scott Steinbeck suggested looking into CBValidation, which I think uses a similar approach to Joi—similar in that it's configuration based.

I'll try to carve out some time to look into both of these approaches.

Charlie Arehart Aug 20, 2022 at 3:10 PM

50 Comments

As always, Ben, great to hear your thoughts and explorations on this. There was a post today in another blog on a related topic, and I thought of you and your post here.

It's instead on testing the USE of feature flags (as well as mocking them, and more):
https://reflectoring.io/testing-feature-flags/

Perhaps a little too meta for some tastes, but I thought you might find it interesting, as may some of the readers of your series here on feature flags.

Ben Nadel Aug 21, 2022 at 11:46 AM

16,161 Comments

@Charlie,

Oh very cool, I'll give it a read. Feature flags are always worth my time 🙃

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.

For Completeness: UserValidation.cfc

Reader Comments

For Completeness: `UserValidation.cfc`