RELoop ColdFusion Custom Tag Case Study
There were not many people who were convinced that it was a good idea, but today on the CF-Talk list, I saw what could be a really convenient use case for it. Here is the problem that Josh Nathanson asked about:
Got a regex challenge...I was able to solve it using an REFind and then REReplace, but I'm wondering if anyone can come with a "one-shot" way to replace without looping. I need to remove any carriage returns within a quoted string, but not touch them if they are outside quotes. So: "the quick brown fox \r\n jumps over the \r\n lazy dog" <-- remove the \r\n's My name is mud \r\n <-- leave this one alone I'm sure this is probably easy for the regex gurus...
As it turns out, this is not the easiest kind of regular expression to code when you want to take care of it all in one shot, but it is a task that becomes quite easy when you use my RELoop ColdFusion custom tag. Take a look:
<!--- Build up our test data. This data will have line breaks inside and outside of quoted values. ---> <cfsavecontent variable="strText"> Hey there, here some text that is not quoted that has line breaks in it. Then, here is some "quoted text that also has some line breaks" in it. Of course, not all "quoted text" needs to have "line breaks" in it; that is only going to happend some of the time and we want to be sure not to replace out the line breaks that are NOT "within quoted values". </cfsavecontent> <!--- Replacing the line breaks directly in the regular expression is gonna be a huge pain in the butt, so we are gonna do the next best thing - we are gonna find all the quoted values and then act on them individually. ---> <cf_reloop index="strValue" text="#strText#" pattern="(""[^""]*"")" variable="strText"> <!--- Now that we have a quoted value, just replace the line breaks and carriage returns. This might be overkill some of the time, but it is the easy solution. ---> <cfset strValue = strValue.ReplaceAll( JavaCast( "string", "[\r\n]+" ), JavaCast( "string", " " ) ) /> </cf_reloop> <!--- When we output the new text, we are going to replace the newlines / carriage returns with <br /> tags so that we can see where the line breaks exist in an HTML context. ---> <p> #strText.ReplaceAll( JavaCast( "string", "\r\n" ), JavaCast( "string", "<br />" ) )# </p>
First, I am building up a chunk of text that has line breaks in both the quoted and the non-quoted parts. Then, I am using the RELoop to iterate over all quoted values and within that, it takes just one simple ReplaceAll() method to clear out the line breaks. There is some overkill for this as you are going to run replaces on quoted values that don't have any line breaks; however, I think the time / effort you save on not having an insane regular expression is worth the overhead of some extraneous replace calls. Running the above code we get the following output:
Hey there, here some text that is not quoted
that has line breaks in it. Then, here is some
"quoted text that also has some line breaks" in it. Of course, not
all "quoted text" needs to
have "line breaks" in it; that is only going to happend some
of the time and we want to be sure not to replace
out the line breaks that are NOT "within quoted values".
It's almost too easy. In my gut, I really feel like this kind of a custom tag is useful, but maybe it's just for a few cases.
Want to use code from this post? Check out the license.
Haven't played with your solution to test it, but it appears that you're not taking into account escaped quotes within a quoted string, so for something like the following string:
She said: "If you want to quote something add a "" symbol to start of \r\n
and a "" to the end\r\n
of the text you're\r\n
Just made that up so there is probably a better example out there, but it does seem that the RegExp you're using only matches pairs of quotes, not matching pairs of quotes.
Again, was just thinking about the escaped quote issue and haven't tested it to see if your solution accommodates them already.
You can write regular expressions that take into account escaped quotes, and this can be done using what Steve Levithan showed me was called "unrolling the loop":
But again, you can only do this when you know the way escaping is done and the context of the problem.
I think this is very useful. In fact, I have run across several problems for which this would have been very useful. Next time I run into one, I will definitely have to download this and try it out.
I really like that it is a custom tag. This allows me to take any type or number of actions within the loop.
What version of CF does it require?
I have tested it in ColdFusion MX7. I uses the Java Pattern object behind the scenes (for more powerful and easier iteration), so it needs to be MX of some sort. I haven't tested on MX6, but as long as it can use CreateObject( "java" ) then it should be fine.
Also, you have option to return a struct of data rather than a simple string, which contains the indexed-group matching, the group count, and the character offset of the match. I tried to make it as flexible as possible. Glad you might find it useful.
Thanks for the link, I'll take a look at it if/when I need such RegExp operations.