REFind() Sub-Expressions (Thanks Adam Cameron!)

Posted December 17, 2007 at 7:52 AM by Ben Nadel

Tags: ColdFusion

The other day, I was discussing the matching of optional groups in regular expressions, when I stated that accessing the captured sub-groups was something that was only available via the Java Pattern / Matcher. Adam Cameron then pointed out that, in fact, that was what the ReturnSubExpressions optional argument for REFind() was for. Now, I have used REFind() a million and one times, but it is very rare that I actually ever use the optional sub-expression argument. And, when I do use it, I guess it has never been with a regular expression that used captured groups. To be honest, Adam's comment was news to me.

To investigate further, I thought I would give this sub-expression searching a little test drive; I took my Java Pattern / Matcher problem from before and converted it to a REFind() problem:

  • <!--- Define target string. --->
  • <cfset strQuery = "ben=nice&maria+bello=sexy!&lori+petty=cool" />
  •  
  • <!---
  • Search for our pattern in the string. Use the optional
  • 4th argument to have ColdFusion return the sub-expression
  • matching.
  • --->
  • <cfset objMatch = REFind(
  • "((([^=]+=[^&]*)&?)+)",
  • strQuery,
  • 1,
  • true
  • ) />
  •  
  • <!--- Dump out sub-expression matching. --->
  • <cfdump
  • var="#objMatch#"
  • label="REFind() Results"
  • />

Much to my surprise, the above code gives us the following CFDump output:


 
 
 

 
REFind() SubExpression Matching Output  
 
 
 

Well I'll be! As you can see, we have four sub-expression results. The first one in always the entire string match. Then indexes 2, 3, and 4 are our captured groups 1, 2, and 3. Unfortunately, since ColdFusion has array indexes starting at one, these captured groups are one off.

Now, let's try to output each of the captured groups:

  • <!--- Loop over sub-expressions. --->
  • <cfloop
  • index="intI"
  • from="1"
  • to="#ArrayLen( objMatch.Pos )#"
  • step="1">
  •  
  • #intI#)
  • #Mid(
  • strQuery,
  • objMatch.Pos[ intI ],
  • objMatch.Len[ intI ]
  • )#
  •  
  • <br />
  •  
  • </cfloop>

Running this code, we get the following output:

1) ben=nice&maria+bello=sexy!&lori+petty=cool
2) ben=nice&maria+bello=sexy!&lori+petty=cool
3)
4) lori+petty=cool

Let's take that result and compare it to the results found in the Java Pattern / Matcher example with the same regular expression:

1) ben=nice&maria+bello=sexy!&lori+petty=cool
2) lori+petty=cool
3) lori+petty=cool

Interesting. The ColdFusion REFind() method did not find a match for our captured group 2, (([^=]+=[^&]*)&?)+. This group is the repeated group. The Java example stores the last matched group into this reference, but the ColdFusion example seems to ignore this. Peculiar difference.

Regardless, it's a bit sad that I love regular expressions so much and have been using them for quite a while and still didn't realize that this is how ColdFusion's REFind() method worked. Thanks Adam Cameron for bringing this to my attention. I still think the Java solution is much more elegant and useful, but it is proper to know how ColdFusion actually works.


You Might Also Be Interested In:



Reader Comments

Dec 27, 2007 at 5:05 PM // reply »
8 Comments

Hi Ben,

Looks like the plus sign at the end of the RegEx causes the reFind to get the LAST of each instance. In Java it's the same way, however, Java has the added feature of getting EACH instance of a match if there's no plus sign. Without the plus sign in CF you'll end up with the FIRST instance.

Please correct me if I'm wrong. Still, I can't account for why the strange bug.

Peter


Dec 27, 2007 at 5:46 PM // reply »
11,243 Comments

@Peter,

I'm not sure it's a bug. I almost can't wrap my head around what it should even return. It's like trying to step inside of a recursive function, only it's a repeated sub-group.

Basically, if I could dictate what it should do, I am not sure I would have a good suggestion.


Dec 27, 2007 at 6:03 PM // reply »
8 Comments

Hi Ben,

The second group for ReFind should be the same thing as the Java Pattern Matcher:

lori+petty=cool

This is because the plus sign dictates for CF to find the last instance in the string of (basically):

anything=anything, with or without the ampersand at the end.

but for some reason it fails on the second subgroup. The cool part about the bug is it actually finds the existence of the string for the group, but the pos (43) and the len (0) are completely wrong.

Nice post, most informative.

Peter


Dec 27, 2007 at 6:10 PM // reply »
11,243 Comments

@Peter,

Ok, I see what you mean. Glad you like the post. I love finding things that I didn't know that seem so core to the language. I've been coding in ColdFusion for years and I JUST find this?!?


Dec 31, 2007 at 4:47 PM // reply »
172 Comments

The output from java.util.regex is correct, plain and simple. It is the output you should expect from any regex library when quantifying a capturing group (the last value matched is what the backreference will contain at that point). ColdFusion's output is no feature... it's demonstrating some kind of bug.


Jan 2, 2008 at 8:18 AM // reply »
11,243 Comments

@Steve,

Thanks. I will log this bug with Adobe.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
May 23, 2013 at 5:19 AM
Ask Ben: Print Part Of A Web Page With jQuery
How to print also the background color of table cells and table lines ... read »
May 23, 2013 at 3:55 AM
Javascript Array Methods: Unshift(), Shift(), Push(), And Pop()
very interesting and helpful too. ... read »
May 22, 2013 at 5:35 PM
Script Tags, jQuery, And Html(), Text() And Contents()
This is still an issue 2 years later. jQuery is supposed to remediate these cross browser issues, no? I have been unable to find any statement from the jQuery team calling this behavior "by de ... read »
May 22, 2013 at 12:44 PM
Ask Ben: Query Loop Inside CFScript Tags
In cf10, if you call a function that has: local.result = {}; local.result.msg = ""; local.svc = new query(); local.svc.setSQL("SELECT * FROM..."); local.obj = local.svc.exe ... read »
May 22, 2013 at 12:29 PM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Ben: What version of Java are you using? Also, did you test users.id to see what Java reports as the data type? I wonder if it's not a Java primitive data type, but getting returned as something ... read »
May 22, 2013 at 11:47 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Dana, Awesome - so it looks like this bug was fixed in ColdFusion 10. Thanks so much for double-checking that. ... read »
May 22, 2013 at 11:37 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
When I c&p and run on cf10, I get: Selected User IDs: 1,4 User 1 selected: YES - YES User 2 selected: NO - NO User 3 selected: NO - NO User 4 selected: YES - YES User 5 selected: NO - ... read »
May 22, 2013 at 11:27 AM
Strange Interaction Between DeserializeJson(), ArrayContains(), And Database Values In ColdFusion
@Tom, Good thought, but no dice. Both of these still exhibit the same behavior: users.id[ users.currentRow ] users[ "id" ][ users.currentRow ] It's just something whacky happening with ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools