The other day, I was discussing the matching of optional groups in regular expressions, when I stated that accessing the captured sub-groups was something that was only available via the Java Pattern / Matcher. Adam Cameron then pointed out that, in fact, that was what the ReturnSubExpressions optional argument for REFind() was for. Now, I have used REFind() a million and one times, but it is very rare that I actually ever use the optional sub-expression argument. And, when I do use it, I guess it has never been with a regular expression that used captured groups. To be honest, Adam's comment was news to me.
To investigate further, I thought I would give this sub-expression searching a little test drive; I took my Java Pattern / Matcher problem from before and converted it to a REFind() problem:
<!--- Define target string. ---> <cfset strQuery = "ben=nice&maria+bello=sexy!&lori+petty=cool" /> <!--- Search for our pattern in the string. Use the optional 4th argument to have ColdFusion return the sub-expression matching. ---> <cfset objMatch = REFind( "((([^=]+=[^&]*)&?)+)", strQuery, 1, true ) /> <!--- Dump out sub-expression matching. ---> <cfdump var="#objMatch#" label="REFind() Results" />
Much to my surprise, the above code gives us the following CFDump output:
Well I'll be! As you can see, we have four sub-expression results. The first one in always the entire string match. Then indexes 2, 3, and 4 are our captured groups 1, 2, and 3. Unfortunately, since ColdFusion has array indexes starting at one, these captured groups are one off.
Now, let's try to output each of the captured groups:
<!--- Loop over sub-expressions. ---> <cfloop index="intI" from="1" to="#ArrayLen( objMatch.Pos )#" step="1"> #intI#) #Mid( strQuery, objMatch.Pos[ intI ], objMatch.Len[ intI ] )# <br /> </cfloop>
Running this code, we get the following output:
Let's take that result and compare it to the results found in the Java Pattern / Matcher example with the same regular expression:
Interesting. The ColdFusion REFind() method did not find a match for our captured group 2, (([^=]+=[^&]*)&?)+. This group is the repeated group. The Java example stores the last matched group into this reference, but the ColdFusion example seems to ignore this. Peculiar difference.
Regardless, it's a bit sad that I love regular expressions so much and have been using them for quite a while and still didn't realize that this is how ColdFusion's REFind() method worked. Thanks Adam Cameron for bringing this to my attention. I still think the Java solution is much more elegant and useful, but it is proper to know how ColdFusion actually works.
Want to use code from this post? Check out the license.