Skip to main content
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with: Tom Chiverton ( @thefalken )

Regular Expressions With Repeated Groups

By on
Tags:

Last night, on my way to the gym, I was rolling some regular expressions around in my head when suddenly it occurred to me that I have no idea what actually gets captured by a group that is repeated within a single pattern. Let me explain; assume we wanted to match a query string - not just a name-value pair, but the whole string of name-value pairs. To do so, we might use a pattern like this:

((([^=]+=[^&]*)&?)+)

Here we are matching three groups. The first group is the entire match. The second group is the name-value pair followed by an optional amphersand. The third group is the actual name-value pair. Of course, when I say "actual" name-value pair, I am not 100% what that means. See, if we have a string of name-value pairs that get matched by the single pattern, what actually shows up in that name-value matched group?

I did a little testing:

<!--- Define target string. --->
<cfset strQuery = "ben=nice&maria+bello=sexy!&lori+petty=cool" />

<!--- Create pattern object. --->
<cfset objPattern = CreateObject(
	"java",
	"java.util.regex.Pattern"
	) />

<!---
	Compile regular expression pattern. In this case
	our pattern consists of one or more name-value pairs
	separated by the & symbol.
--->
<cfset objPattern = objPattern.Compile(
	JavaCast(
		"string",
		"((([^=]+=[^&]*)&?)+)"
		)
	) />

<!--- Get matcher for target string with the given pattern. --->
<cfset objMatcher = objPattern.Matcher(
	JavaCast(
		"string",
		strQuery
		)
	) />


<!--- Keep looping over matches. --->
<cfloop condition="objMatcher.Find()">

	1) #objMatcher.Group( JavaCast( "int", 1 ) )#<br />
	2) #objMatcher.Group( JavaCast( "int", 2 ) )#<br />
	3) #objMatcher.Group( JavaCast( "int", 3 ) )#<br />
	<br />

</cfloop>

Here we create a string of three name-value pairs. We then use the pattern above, which will match the entire string, and loop over the matcher for that pattern (which will loop once since our pattern matches the entire string). Running the above code, we get the following output:

1) ben=nice&maria+bello=sexy!&lori+petty=cool
2) lori+petty=cool
3) lori+petty=cool

Interesting. It looks like the repeated group just captures the last possible group matched as part of the sub-expression. That makes sense, I guess; it's not like it could return an array of matched groups. Not even an issue, since you would never need to access this information. And, if you did, you could just match on individual name-value pairs rather than the entire string.

Want to use code from this post? Check out the license.

Reader Comments

15,674 Comments

@Anuj,

No problem. The regular expression itself does not require Java; however, being able to access the matched groups is only available via the Java Pattern / Matcher as far as I know.

44 Comments

"Last night, on my way to the gym, I was rolling some regular expressions around in my head"

I don't now about you, but on the way to yoga class the only thing I thinking about is: "Jesus, I beg of you, please there be a hot chick be front of me tonight."

15,674 Comments

Ha ha ha :) There's usually a few hot girls at my gym. I like to wait till I get there, pick one out, and then hope she gives me the time of day :)

67 Comments

> being able to access the matched groups is only available via the Java Pattern / Matcher as far as I know

There's the returnsubexpressions option for reFind(). That does what you're suggesting, dunnit? It's not as nice as your approach, that said.

--
Adam

172 Comments

.NET actually gives you access to all the values captured by repeated groups, as does the just-released Perl 5.10 (when using named capture). I wish this feature were more common.

15,674 Comments

@Adam,

REMatch() just returns an array in which each array index contains the entire pattern match (one array index for each complete pattern match in the target string). I don't believe that it deals with individual captured groups.

You can sort of think of it like this:

REMatch() is to the target string what "captured group" is to the matched pattern.

67 Comments

You're dead right, that's exactly what reMatch() does. I misread/mistook "repeated group" for "repeated match".

Cheers for pulling me up on that one... it lead to some interesting reading. Well: as interesting as regexes get, anyways ;-)

Whilst on the subject, I was initially quietly hopeful about the possibilities of reMatch(), expecting it somehow to - as you suggest - capture/extract/return the subexpressions (repeated groups/subexpressions are not something that'd occurred to me one way or the other, to be honest) as well. But unlike reFind(), there is no "returnsubexpressions" switch. This is a significant shortcoming in my view. But it's a start, anyhow.

--
Adam

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel