Consuming Large Configuration Files Inside A ColdFusion Component

Published 2024-01-10 in ColdFusion — Comments (7)

A few weeks ago, I stumbled across a GitHub repository that maintains a large collection (3K+) of disposable email address domains. These are domains that will generate a random email address that is only valid for a short period of time. I was considering adding some email validation logic to one of my ColdFusion applications; which is when I had to decide where to put the list of domains in my ColdFusion code. This isn't a use-case that I encounter very often, so I wanted to do a quick write-up.

My first thought was that I could simply inline the list of 3K+ email address domains right in my ColdFusion component using an inline array declaration:

domains = [
	"0-mail.com",
	"027168.com",
	"0815.ru",
	"0815.ry",
	"0815.su",
	"0845.ru",
	"0box.eu",
	// ... 3K+ more domains ...
];

This can work; and is completely acceptable with a small list; but, it has some significant limitations:

As the size of this list grows, the signal-to-noise ratio becomes increasingly skewed inside the ColdFusion component. Meaning, more and more of the lines of code in the component are just "cruft" that have to be there (as the configuration) but don't actually serve any other purpose.
As the size of the list grows, I believe you can eventually run into a "maximum method size" issue; though, I am not sure if this is still true in modern ColdFusion.
As the size of the list grows, I believe you can eventually run into a "maximum template size" issue; though, I am not sure if this is still true in modern ColdFusion.
As the size of the list grows, the editor may become slower when accessing and updating this file (which you, as the developer, may experience in the form of keyboard and mouse latency).
By mixing "logic" and "configuration" in the same file, it increases the likelihood that you will accidentally break some of the code when intending to update the configuration portion of the ColdFusion component.

The better approach, in my opinion, is to put the list of domains in a separate configuration file; and then, load the configuration file during the application bootstrapping process. Which begs the next question: should the ColdFusion component know where its own configuration file is? Or, should the application provide the configuration file to the ColdFusion component?

On balance, we want to create ColdFusion applications that are flexible; but, which aren't overly complex. Which means that we don't want a given ColdFusion component to know any more than it has to about the file system structure.

One way to accomplish this is to create a custom mapping in the Application.cfc framework that points to a location in the file system:

this.mappings[ "/config" ] = "...."

And then, allow the ColdFusion component to use the /config mapping internally when loading the configuration file. This creates some coupling between the ColdFusion component and the file system; but, at least there's some level of indirection that allows the location of the configuration file to be changed without having to change the ColdFusion component logic.

Aside: This "shared mapping" approach can be helpful when 3rd-party modules need to know how to locate other ColdFusion components within their module boundary (and can't exclusively use child-paths).

The other option is to provide the full configuration file path as a constructor argument when instantiating the ColdFusion component. This completely decouples the ColdFusion component from the file system; and, keeps the burden of file organization fully externalized.

To explore this latter option, I've created a ColdFusion component, DisposableEmails.cfc, which requires a file path in its constructor. Then, when the ColdFusion component is instantiated, it reads in the configuration file and builds an internal data structure within the buildDomainIndex() method:

component
	output = false
	hint = "I provide information about disposable email domains (as provided by https://github.com/disposable-email-domains)."
	{

	/**
	* I initialize the disposable email domains using the given data file. The data file
	* is expected to have a single domain per line.
	*/
	public void function init( required string dataFilePath ) {

		variables.domainIndex = buildDomainIndex( dataFilePath );

	}

	// ---
	// PUBLIC METHODS.
	// ---

	/**
	* I return the collection of disposable email domains.
	*/
	public array function getDomains() {

		return( domainIndex.keyArray() );

	}


	/**
	* I determine if the given domain matches a disposable email domain.
	*/
	public boolean function isDisposableDomain( required string domain ) {

		return( domainIndex.keyExists( domain ) );

	}


	/**
	* I determine if the given email address contains a disposable email domain.
	*/
	public boolean function isDisposableEmail( required string email ) {

		return( isDisposableDomain( email.listRest( "@" ) ) );

	}

	// ---
	// PRIVATE METHODS.
	// ---

	/**
	* I build the domain index from the given data file. The data file is expected to
	* contain a single domain per line.
	*/
	private struct function buildDomainIndex( required string dataFilePath ) {

		var domainIndex = {};
		var dataFile = fileOpen( dataFilePath, "read", "utf-8" );

		try {

			while ( ! fileIsEOF( dataFile ) ) {

				var domain = fileReadLine( dataFile )
					.trim()
				;

				if ( domain.len() ) {

					domainIndex[ domain ] = true;

				}

			}

		} finally {

			fileClose( dataFile );

		}

		return( domainIndex );

	}

}

At this point, the DisposableEmails.cfc is coupled to the structure of the configuration file, but not to the location of the configuration file.

Now, when instantiating this ColdFusion component, we have to pass-in a fully-qualified file path:

<cfscript>

	// The ColdFusion component doesn't (and shouldn't) know where the configuration file
	// is being stored. As such, we need to provide the full path to data file when we
	// instantiate our component.
	disposableEmails = new DisposableEmails( expandPath( "./DisposableEmails.conf" ) );

	// Some known domains to test.
	testDomains = [
		"bennadel.com", // NOT disposable.
		"247web.net"    // DISPOSABLE.
	];

	// Testing domains.
	for ( domain in testDomains ) {

		dump(
			label = "Domain: #domain#",
			var = disposableEmails.isDisposableDomain( domain )
		);

	}

	// Testing emails.
	for ( domain in testDomains ) {

		dump(
			label = "Email: ben@#domain#",
			var = disposableEmails.isDisposableEmail( "ben@#domain#" )
		);

	}

	dump(
		label = "Disposable Email Domains",
		var = disposableEmails.getDomains(),
		top = 10 // Limit the output.
	);

</cfscript>

As you can see, the new operator receives the location of the configuration file. And, when we run this ColdFusion code, we successfully read-in the list of disposable email domains and can validate both domains and email addresses:

A series of email domains and addresses that have been validated.

This represents a nice separation of concerns. The overall application knows where the configuration file is; but, it doesn't know what's in it or how to read it. The ColdFusion component, on the other hand, doesn't know where the file is; but, it does know how to read it and how to transform the contents into a consumable data structure. Every part of the application is doing what it does best (and no more).

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/4577

Reader Comments

Ben Nadel Jan 10, 2024 at 2:32 PM

15,781 Comments

Furthermore, updating the external configuration file is also much safer since you don't have to worry about accidentally corrupting any of the business logic with copy-pasta.

Will B. Jan 10, 2024 at 5:44 PM

36 Comments

I like your solution. I've done similar setups a few times with Coldbox and Wirebox where I just pass in the config file path. In fact, almost every time I have more than a few lines of config data, I go this route.

Your solution for this is really clean and elegant. Now, on startup, see if you need to refresh the data file from the Github repo automatically! Heh.

Ben Nadel Jan 11, 2024 at 11:21 AM

15,781 Comments

@Will,

Where do you actually store the config files? As I was writing this post, I kept going back and forth in my mind if I wanted to get into that; but, since I don't have a solid "best practice", I felt it best to side-step the conversation.

Part of me wants to put all the "config" files in the same place. But, then part of me wants each config file to be collocated with the component that uses them. So, for example, in this blog post, the ColdFusion component and the config were right next to each other:

./DisposableEmails.cfc
./DisposableEmails.conf

Though, in real life, I might put both of those files together in a sub-directory:

./disposable_emails/DisposableEmails.cfc
./disposable_emails/domains.conf

Of course, it doesn't have to be a one-size-fits-all; I'm just thinking out loud. And, there is application-level configuration data that I do keep in special place.

Ben Nadel Jan 11, 2024 at 11:22 AM

15,781 Comments

@All, as a total aside, when I was writing this post, I became curious as to whether or not I could use a per-application mapping to point to a full file path. Turns out, you can:

www.bennadel.com/blog/4578-using-per-application-mappings-to-alias-files-in-coldfusion.htm

I'm not sure I would ever actually use the mappings that ways. But, it seemed relevant to this conversation.

James Moberg Jan 11, 2024 at 9:36 PM

1 Comments

I noticed that you are using lcase when adding keys to the struct and also when checking for keys. Is there a performance reason for this? You aren't using a casesensitive or ordered-casesensitive struct, so I wouldn't think that it would make any difference.

When reading the file lines, I use java's replaceAll() to sanitize unwanted characters. Using replaceAll("["",\s]", "") should remove quotes, commas and ASCII spaces. (I use this approach when consuming multiple files where some are JSON and contain a single domain per line.)

Regarding domains, I encountered abuse where comment form spammers were using emails with randomized sub-domains. This required a routine to reduce the email host sub-domain levels and re-search until only a name & TLD remained.

Ben Nadel Jan 12, 2024 at 10:55 AM

15,781 Comments

@James,

No, there's no reason I was really doing it. Even as I was writing the code, I was thinking to myself: "Structs are case insensitive, you don't need to do this". And, even so, I couldn't stop myself. I can't really explain the urge. Let me remove it though, because I don't want to mislead anyone. 👍

Ben Nadel Jan 12, 2024 at 10:59 AM

15,781 Comments

@James, consider it removed -- thanks for the call-out.

Reader Comments

Post A Comment — ❤️ I'd Love To Hear From You! ❤️

Post A Comment — I'd Love To Hear From You!