Skip to main content
Ben Nadel at the MySQL NYC Meetup (Oct. 2024) with: David Baird
Ben Nadel at the MySQL NYC Meetup (Oct. 2024) with: David Baird

Counting The Occurrences Of A Substring Or RegEx Pattern In ColdFusion

By
Published in

The other day, in my Incident Commander app code, I needed to count the number of back-ticks in a truncated piece of text in order to make sure that the count was balanced (ie, that there were an equal number of starting and ending back-ticks for a Slack-formatted message). I don't often have to count substrings in ColdFusion; but, I was surprised to find that even in recent releases of the language there's no native method for counting occurrences of a substring or regular expression pattern. As such, I wanted to take a quick look at how this can be done in Adobe ColdFusion.

This isn't the first time that I've explored this space—I have an "Ask Ben" question from 2006 that deals with counting spaces. And that's kind of why I'm surprised that almost 20-years later, there's still no native abstraction for this. To be clear, ColdFusion has plenty of lower-level constructs that make this task very achievable; so, perhaps it's just not worth it to add an additional abstraction.

That said, let's do some searching!

Using find() With a Static String

The find() and findNoCase() functions allow us to search for a substring starting at a given offset. We can use these functions to perform a "count" by starting at the front of a string; and then, every time we find a match, we simply move the start location to the position directly after that match. Then, we continue searching until no more matches can be found. At the end, our count will match the number of successful find() operations.

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": countOccurrences( input, "move it" ),
		"LIKE TO": countOccurrences( input, "LIKE TO" )
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given substring within the target string
	* using a case-sensitive search.
	*/
	private numeric function countOccurrences(
		required string target,
		required string substring,
		numeric startingAt = 1
		) {

		var count = 0;

		do {

			var position = target.find( substring, startingAt );

			if ( position ) {

				count++;
				startingAt = ( position + substring.len() );

			}

		} while ( position );

		return count;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • LIKE TO → 0

Using reFind("one") With a Regular Expression Pattern

In ColdFusion, the find() functions have regular expression alternatives that start with re: reFind() and reFindNoCase(). In this next approach, we can use the exact same strategy of moving the starting position forward on each subsequent find operation; only, instead of using a static string we're going to use a RegEx pattern.

Since the flexibility of regular expression patterns can result in an unpredictable substring match, we need to ask the reFind() function to return the sub-expressions. This way, we can examine the length of each match so that we know how far forward to move the offset for the next reFind() call.

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": reCountOccurrences( input, "move it" ),
		"(move it), \1": reCountOccurrences( input, "(move it), \1" ),
		"like.to": reCountOccurrences( input, "like.to" )
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given regular expression pattern within the
	* target string using a case-sensitive search.
	*/
	private numeric function reCountOccurrences(
		required string target,
		required string pattern,
		numeric startingAt = 1
		) {

		var count = 0;

		do {

			var result = target.reFind( pattern, startingAt, true, "one" );
			var position = result.pos[ 1 ];
			var length = result.len[ 1 ];

			if ( position ) {

				count++;
				startingAt = ( position + length );

			}

		} while ( position );

		return count;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • (move it), \1 → 3
  • like.to → 3

Using reFind("all") With a Regular Expression Pattern

In the previous example, you may have noticed that I was passing "one" as the last argument to the reFind() method. This defines the "scope" of the find operation. The other option is to use "all", which tells ColdFusion to return all of the regular expression pattern matches at one time. We can use this scope to simplify the code.

One thing to be mindful of is that when we tell the reFind() function to return the sub-expressions, it will always return a result even if there is no match. As such, we'll have to look at the first pos value to make sure that it's located within the bounds of the target string. If it's a 0, it means that we didn't find any matches. But, if it's non-0, then the length of the results array will match the count of the pattern matches.

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": reCountOccurrences( input, "move it" ),
		"(move it), \1": reCountOccurrences( input, "(move it), \1" ),
		"like.to": reCountOccurrences( input, "like.to" )
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given regular expression pattern within the
	* target string using a case-sensitive search.
	*/
	private numeric function reCountOccurrences(
		required string target,
		required string pattern,
		numeric startingAt = 1
		) {

		var results = target.reFind( pattern, startingAt, true, "all" );

		// When returning sub-expressions, the reFind() function will always return at
		// least ONE result, even if there is no match. As such, we need to check the
		// position of the first match to see if it's in the target.
		if ( results[ 1 ].pos[ 1 ] ) {

			return results.len();

		}

		return 0;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • (move it), \1 → 3
  • like.to → 3

Using reMatch() With a Regular Expression Pattern

Using the "all" scope with the reFind() function gives us a result that feels very much like the reMatch() function. In fact, we can greatly simplify this RegEx-based approach by using reMatch() and then just looking at the number of pattern matches that get returned:

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": reCountOccurrences( input, "move it" ),
		"(move it), \1": reCountOccurrences( input, "(move it), \1" ),
		"like.to": reCountOccurrences( input, "like.to" )
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given regular expression pattern within the
	* target string using a case-sensitive search.
	*/
	private numeric function reCountOccurrences(
		required string target,
		required string pattern
		) {

		return target.reMatch( pattern ).len();

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • (move it), \1 → 3
  • like.to → 3

The reMatch() function has no concept of a "start" offset; so, this isn't strictly the same as the previous examples. But, if you don't have to worry about where to start the search, this is a very simple approach.

Using reMatch() With a Static Substring

The reMatch() function makes it very simple to count occurrences because we're just looking at the length of the resultant array. But, sometimes we just want to find a static substring and not perform a regular expression search. For this, we can use the reEscape() function. This function ensures that there are no special pattern-matching constructs within the given string by forcing all characters to be treated as literal matches.

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": countOccurrences( input, "move it" ),
		"m[o]ve it": countOccurrences( input, "m[o]ve it" ) // Should have 0 matches.
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given substring within the target string
	* using a case-sensitive search.
	*/
	private numeric function countOccurrences(
		required string target,
		required string substring
		) {

		// Here, we're using the ergonomics of the reMatch() function. However, in order
		// to make sure that we don't treat the substring as regular expression pattern,
		// we're going to escape it.
		return target.reMatch( reEscape( substring ) ).len();

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • m[o]ve it → 0

If the second attempt were treated as a regular expression pattern, it would have found 6 as well; however, since the character class constructs ([]) are being escaped, there are no literal matches.

Using Java's Pattern/Matcher With a Regular Expression Pattern

No discussion of iterating over regular expression patterns in ColdFusion can be complete without a quick look at Java's Pattern and Matcher classes. These classes provide a tremendous amount of low-level functionality; and, are what power most of the features in my jRegEx ColdFusion component.

Much like the reFind() approach that we looked at earlier, the Matcher class gives us a way to iterate over each matching pattern in a given target. And what we end up with is a count that matches the number of success .find() operations.

<cfscript>

	input = "
		I like to move it, move it.
		You like to move it, move it.
		We like to move it, move it.
	";

	writeDump([
		"move it": reCountOccurrences( input, "move it" ),
		"(move it), \1": reCountOccurrences( input, "(move it), \1" ),
		"like.to": reCountOccurrences( input, "like.to" )
	]);

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	/**
	* I count the number of occurrences of the given regular expression pattern within the
	* target string using a case-sensitive search.
	*/
	private numeric function reCountOccurrences(
		required string target,
		required string pattern
		) {

		var matcher = createObject( "java", "java.util.regex.Pattern" )
			.compile( pattern )
			.matcher( target )
		;
		var count = 0;

		while ( matcher.find() ) {

			count++;

		}

		return count;

	}

</cfscript>

When we run this ColdFusion code, we get the following output:

  • move it → 6
  • (move it), \1 → 3
  • like.to → 3

Much like the reFind() function, the Matcher class does give us an opportunity to change the offset location of the search. But, it makes the code more complex, so I'm not going to look at it in this demo. Just know that the low-level control exists if you ever want to use it.

In my Incident Commander app, which I alluded to at the top of this post, I ended up using the reMatch() approach. Which, to be fair, provides very nice developer ergonomics. But, in cases where I only need a count, it seems unfortunate to allocate an array of matches only to turn around and throw them away. It's probably premature optimization; but, having a native abstraction for a count-based search feels like it would be a value-add in ColdFusion.

Want to use code from this post? Check out the license.

Reader Comments

Post A Comment — I'd Love To Hear From You!

Post a Comment

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel