Regular Expressions With Repeated Groups

Posted December 14, 2007 at 9:04 AM by Ben Nadel

Tags: ColdFusion

Last night, on my way to the gym, I was rolling some regular expressions around in my head when suddenly it occurred to me that I have no idea what actually gets captured by a group that is repeated within a single pattern. Let me explain; assume we wanted to match a query string - not just a name-value pair, but the whole string of name-value pairs. To do so, we might use a pattern like this:

((([^=]+=[^&]*)&?)+)

Here we are matching three groups. The first group is the entire match. The second group is the name-value pair followed by an optional amphersand. The third group is the actual name-value pair. Of course, when I say "actual" name-value pair, I am not 100% what that means. See, if we have a string of name-value pairs that get matched by the single pattern, what actually shows up in that name-value matched group?

I did a little testing:

  • <!--- Define target string. --->
  • <cfset strQuery = "ben=nice&maria+bello=sexy!&lori+petty=cool" />
  •  
  • <!--- Create pattern object. --->
  • <cfset objPattern = CreateObject(
  • "java",
  • "java.util.regex.Pattern"
  • ) />
  •  
  • <!---
  • Compile regular expression pattern. In this case
  • our pattern consists of one or more name-value pairs
  • separated by the & symbol.
  • --->
  • <cfset objPattern = objPattern.Compile(
  • JavaCast(
  • "string",
  • "((([^=]+=[^&]*)&?)+)"
  • )
  • ) />
  •  
  • <!--- Get matcher for target string with the given pattern. --->
  • <cfset objMatcher = objPattern.Matcher(
  • JavaCast(
  • "string",
  • strQuery
  • )
  • ) />
  •  
  •  
  • <!--- Keep looping over matches. --->
  • <cfloop condition="objMatcher.Find()">
  •  
  • 1) #objMatcher.Group( JavaCast( "int", 1 ) )#<br />
  • 2) #objMatcher.Group( JavaCast( "int", 2 ) )#<br />
  • 3) #objMatcher.Group( JavaCast( "int", 3 ) )#<br />
  • <br />
  •  
  • </cfloop>

Here we create a string of three name-value pairs. We then use the pattern above, which will match the entire string, and loop over the matcher for that pattern (which will loop once since our pattern matches the entire string). Running the above code, we get the following output:

1) ben=nice&maria+bello=sexy!&lori+petty=cool
2) lori+petty=cool
3) lori+petty=cool

Interesting. It looks like the repeated group just captures the last possible group matched as part of the sub-expression. That makes sense, I guess; it's not like it could return an array of matched groups. Not even an issue, since you would never need to access this information. And, if you did, you could just match on individual name-value pairs rather than the entire string.




Reader Comments

Dec 14, 2007 at 11:43 AM // reply »
12 Comments

Ben, I like the regex example but more importantly I like the way you used Java to do it. Thanks for posting this.
Cheers.


Dec 14, 2007 at 12:23 PM // reply »
10,640 Comments

@Anuj,

No problem. The regular expression itself does not require Java; however, being able to access the matched groups is only available via the Java Pattern / Matcher as far as I know.


Dec 14, 2007 at 2:00 PM // reply »
43 Comments

"Last night, on my way to the gym, I was rolling some regular expressions around in my head"

I don't now about you, but on the way to yoga class the only thing I thinking about is: "Jesus, I beg of you, please there be a hot chick be front of me tonight."


Dec 14, 2007 at 2:44 PM // reply »
10,640 Comments

Ha ha ha :) There's usually a few hot girls at my gym. I like to wait till I get there, pick one out, and then hope she gives me the time of day :)


Dec 16, 2007 at 5:50 AM // reply »
67 Comments

> being able to access the matched groups is only available via the Java Pattern / Matcher as far as I know

There's the returnsubexpressions option for reFind(). That does what you're suggesting, dunnit? It's not as nice as your approach, that said.

--
Adam


Dec 17, 2007 at 7:57 AM // reply »
10,640 Comments

@Adam,

Thanks for the education! I didn't ever know that sub expressions were captured that way. When it comes to REFind(), I've only ever seen the results with one array element.

Thanks again.

http://www.bennadel.com/index.cfm?dax=blog:1090.view


Dec 31, 2007 at 5:00 PM // reply »
168 Comments

.NET actually gives you access to all the values captured by repeated groups, as does the just-released Perl 5.10 (when using named capture). I wish this feature were more common.


Jan 2, 2008 at 9:51 AM // reply »
67 Comments

Is that like what reMatch() does?

http://livedocs.adobe.com/coldfusion/8/functions_m-r_27.html

--
Adam


Jan 2, 2008 at 6:31 PM // reply »
10,640 Comments

@Adam,

REMatch() just returns an array in which each array index contains the entire pattern match (one array index for each complete pattern match in the target string). I don't believe that it deals with individual captured groups.

You can sort of think of it like this:

REMatch() is to the target string what "captured group" is to the matched pattern.


Jan 4, 2008 at 8:06 PM // reply »
67 Comments

You're dead right, that's exactly what reMatch() does. I misread/mistook "repeated group" for "repeated match".

Cheers for pulling me up on that one... it lead to some interesting reading. Well: as interesting as regexes get, anyways ;-)

Whilst on the subject, I was initially quietly hopeful about the possibilities of reMatch(), expecting it somehow to - as you suggest - capture/extract/return the subexpressions (repeated groups/subexpressions are not something that'd occurred to me one way or the other, to be honest) as well. But unlike reFind(), there is no "returnsubexpressions" switch. This is a significant shortcoming in my view. But it's a start, anyhow.

--
Adam


Jan 5, 2008 at 2:17 PM // reply »
10,640 Comments

@Adam,

I agree. A while back, I fooled around with a ColdFusion custom tag that could loop over regular expressions and return sub expressions:

http://www.bennadel.com/index.cfm?dax=blog:971.view

I thought it was pretty bad ass, but got some push back on it. I still like it :)


Apr 20, 2009 at 8:38 PM // reply »
22 Comments

@Ben: Yet again you saved me a hell of a lot of time with this post! Thanks.


Apr 21, 2009 at 8:14 AM // reply »
10,640 Comments

@Sam,

Not sure how it helped, but awesome!


Jun 8, 2009 at 11:12 PM // reply »
1 Comments

http://www.regular-expressions.info/captureall.html gives a very good explanation of what is going on under the hood while capturing a repeating group.


Jun 9, 2009 at 8:21 AM // reply »
10,640 Comments

@David,

That is a good explanation. Thanks for pointing that out.



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 12, 2012 at 3:37 AM
Learning ColdFusion 8: CFImage Part III - Watermarks And Transparency
Hi Ben, Just to ask currently it is placed bottom right corner, if i need to replace the same rendered image on the bottom left side or in the bottom center, how that can be calculated. bottom ce ... read »
Feb 11, 2012 at 9:29 PM
Use jQuery's SlideDown() With Fixed-Width Elements To Prevent Jumping
I can't say how glad I am that I found your post. Thank you very much. ... read »
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »