Extracting And Interpolating URL Subdomains Using ColdFusion

By Ben Nadel

Published 2016-04-28 in ColdFusion

CAUTION: This post is much more like a code kata for myself rather than a generally applicable concept. That said, working with lists is very useful in many situations.

At InVision App, we run our code-base in a whole bunch of environments, each of which has a unique domain. And, in each of those unique environments, we have to extract subdomains from the incoming request and interpolate subdomains for the outgoing response (for things like link generation). Currently, this is done though an overly complex and hard-to-maintain series of Regular Expressions that date back to the early days of the application. But, I think if we start looking at domains like delimited lists, we can make subdomain extraction and interpolation much easier.

In the early days of InVision App, our deployment process was very manual. Sure, it was driven by git; but, it was very manual. We weren't using configurations or build servers or Chef or Capistrano or any of that good stuff. As such, domain matching (in the code) had to be complex because it had to be very flexible. Hence the Regular Expressions.

But, now that everything is driven by per-environment configurations, I believe that this process can be greatly simplified. I think each environment can have its own "domain pattern" which is little more than the known domain with a wildcard in the subdomain location. For example:

SUBDOMAIN_PATTERN = beta5.*.invisionapp.com

This is easy to read, understand, and change from environment to environment. In the code, we could then compare this pattern against incoming request data, such as the cgi.server_name, and extract the subdomain. Then, when we need to generate links for the given subdomain, all we would need to do is replace the wildcard with the desired subdomain.

Here's a little proof-of-concept:

<cfscript>

	// Let's test list extraction.
	writeOutput( "Extraction: <br />" );
	writeOutput( getListToken( "*.b.c.d.e.f.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.*.c.d.e.f.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.*.d.e.f.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.c.*.e.f.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.c.d.*.f.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.c.d.e.*.g", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.c.d.e.f.*", "a.b.c.d.e.f.g", "." ) & "<br />" );
	writeOutput( "<br />" );

	// Let's test list injection.
	writeOutput( "Injection: <br />" );
	writeOutput( setListToken( "*.b.c.d.e.f.g", "a", "." ) & "<br />" );
	writeOutput( setListToken( "a.*.c.d.e.f.g", "b", "." ) & "<br />" );
	writeOutput( setListToken( "a.b.*.d.e.f.g", "c", "." ) & "<br />" );
	writeOutput( setListToken( "a.b.c.*.e.f.g", "d", "." ) & "<br />" );
	writeOutput( setListToken( "a.b.c.d.*.f.g", "e", "." ) & "<br />" );
	writeOutput( setListToken( "a.b.c.d.e.*.g", "f", "." ) & "<br />" );
	writeOutput( setListToken( "a.b.c.d.e.f.*", "g", "." ) & "<br />" );
	writeOutput( "<br />" );

	// Let's test incompatible matches.
	writeOutput( "Misses: <br />" );
	writeOutput( getListToken( "a.b.c.*.e.f.g", "a.b.c.d.e.f.g.h", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.c.*.e.f.g", "a.b.c.d.e.f._", "." ) & "<br />" );
	writeOutput( getListToken( "a.b.d.*.e.f.g", "b.c.d.e.f", "." ) & "<br />" );
	writeOutput( "<br />" );


	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //


	/**
	* Given a list pattern and a list input, I return the list item from the input that
	* corresponds to the wildcard - "*" - in the list pattern. If no wildcard exists, or
	* it cannot be mapped onto the input, the empty string is returned.
	*
	* @pattern I am the list that contains the wildcard.
	* @input I am the list that contains the target token.
	* @delimiter I am the list delimiter.
	* @output false
	*/
	public string function getListToken(
		required string pattern,
		required string input,
		string delimiter = ","
		) {

		// Convert the lists into arrays so they're easier and more efficient to use.
		var patternTokens = listToArray( pattern, delimiter );
		var inputTokens = listToArray( input, delimiter );

		// If the two list are different in length, we can't compare them.
		if ( arrayLen( patternTokens ) != arrayLen( inputTokens ) ) {

			return( "" );

		}

		var possibleToken = "";

		// Search for the token that corresponds to the wildcard.
		for ( var i = 1 ; i <= arrayLen( patternTokens ) ; i++ ) {

			var patternToken = patternTokens[ i ];
			var inputToken = inputTokens[ i ];

			// If the pattern token is the wildcard, it means that the corresponding
			// input token is the one we're looking for; but, we can't return it until
			// we know that the rest of the tokens in the two lists all match (otherwise
			// the comparison is not valid).
			if ( patternToken == "*" ) {

				possibleToken = inputToken;

			// If any of the non-wildcard tokens don't match, the comparison between
			// the two lists is invalid and we cannot return a matching token.
			} else if ( patternToken != inputToken ) {

				return( "" );

			}

		}

		return( possibleToken );

	}


	/**
	* Given a list pattern and an input token, I return a version of the list in which
	* the wildcard has been replaced by the input token.
	*
	* @pattern I am the list that contains the wildcard.
	* @inputToken I am the token being replaced into the list.
	* @delimiter I am the list delimiter.
	* @output false
	*/
	public string function setListToken(
		required string pattern,
		required string inputToken,
		string delimiter = ","
		) {

		// Convert the list into an array so it's easier and more efficient to use.
		var patternTokens = listToArray( pattern, delimiter );

		// Search for the wildcard token and replace it.
		for ( var i = 1 ; i <= arrayLen( patternTokens ) ; i++ ) {

			if ( patternTokens[ i ] == "*" ) {

				patternTokens[ i ] = inputToken;

				// We only expect the pattern list to contain one wildcard. Now that
				// we've found it, we can stop searching.
				break;

			}

		}

		return( arrayToList( patternTokens, delimiter ) );

	}

</cfscript>

As you can see, internally to the getListToken() and setListToken() functions, we're really just treating the values as delimited lists and comparing the tokens at each corresponding location. When we run this code, we get the following page output:

Extraction:
a
b
c
d
e
f
g

Injection:
a.b.c.d.e.f.g
a.b.c.d.e.f.g
a.b.c.d.e.f.g
a.b.c.d.e.f.g
a.b.c.d.e.f.g
a.b.c.d.e.f.g
a.b.c.d.e.f.g

Misses:

As you can see, we were able to extract the correct token and interpolate the target token, respectively. This isn't nearly as magical as a Regular Expression; but, it's a heck of a lot easier to understand and maintain. And, this is coming from someone who loves Regular Expressions (video presentation)!

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3083

Reader Comments

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.