Last night, on my way to the gym, I was rolling some regular expressions around in my head when suddenly it occurred to me that I have no idea what actually gets captured by a group that is repeated within a single pattern. Let me explain; assume we wanted to match a query string - not just a name-value pair, but the whole string of name-value pairs. To do so, we might use a pattern like this:
Here we are matching three groups. The first group is the entire match. The second group is the name-value pair followed by an optional amphersand. The third group is the actual name-value pair. Of course, when I say "actual" name-value pair, I am not 100% what that means. See, if we have a string of name-value pairs that get matched by the single pattern, what actually shows up in that name-value matched group?
I did a little testing:
<!--- Define target string. ---> <cfset strQuery = "ben=nice&maria+bello=sexy!&lori+petty=cool" /> <!--- Create pattern object. ---> <cfset objPattern = CreateObject( "java", "java.util.regex.Pattern" ) /> <!--- Compile regular expression pattern. In this case our pattern consists of one or more name-value pairs separated by the & symbol. ---> <cfset objPattern = objPattern.Compile( JavaCast( "string", "((([^=]+=[^&]*)&?)+)" ) ) /> <!--- Get matcher for target string with the given pattern. ---> <cfset objMatcher = objPattern.Matcher( JavaCast( "string", strQuery ) ) /> <!--- Keep looping over matches. ---> <cfloop condition="objMatcher.Find()"> 1) #objMatcher.Group( JavaCast( "int", 1 ) )#<br /> 2) #objMatcher.Group( JavaCast( "int", 2 ) )#<br /> 3) #objMatcher.Group( JavaCast( "int", 3 ) )#<br /> <br /> </cfloop>
Here we create a string of three name-value pairs. We then use the pattern above, which will match the entire string, and loop over the matcher for that pattern (which will loop once since our pattern matches the entire string). Running the above code, we get the following output:
Interesting. It looks like the repeated group just captures the last possible group matched as part of the sub-expression. That makes sense, I guess; it's not like it could return an array of matched groups. Not even an issue, since you would never need to access this information. And, if you did, you could just match on individual name-value pairs rather than the entire string.
Want to use code from this post? Check out the license.