I love regular expressions and all the time I am finding new ways to leverage their insane black magic voodoo powers within ColdFusion. I just came across the regular expression "quote" character, \Q. \Q allows us to create a pattern of characters that are to be evaluated as character literals, not as possible regular expression constructs. \Q will perform a literal match of everything to the right of it until it hits the end of the expression or the end construct, \E. Therefore, this patterns:
... will search for the literal string "ben".
While this might not seem very important, it can make searching text much easier. One of the things that ColdFusion sorely lacks is a way to easily iterate over matches in a string (literal or regular expression). Luckily, Java provides us with the Pattern and Matcher objects to make iterating over pattern matches easy. But what about literal strings? That's where this \Q-\E construct comes into play.
Using the regular expression quote constructs, we can leverage the power of the Java pattern matcher to iterate over literal string matches in a chunk of target text. I have thought about using the Java pattern matcher before, but because there are so many special characters in regular expressions, I have always been very hesitant to do this - most of the time, I cannot be sure that the user-entered search phrases don't have special regular expression characters (like "." or "\"). Now, this is no longer a concern; take a look at this demo:
<!--- Store some text that we want to search. We are going to make sure that this text has characters that would be considered special characters within a regular expression. ---> <cfsavecontent variable="strText"> Hey Maria, you better stop. I don't think it's a good idea for you to change while I'm still in the room?!?!? I mean, sure you're looking hella fine [sic]! But, what would your parents think?!?!? </cfsavecontent> <!--- We are going to store the search phrase in variable. This is just to demonstrate that the search phrase could come from anywhere, including a search form with user-entered criteria. In our case, we are going to use one phrase that has the ? which is a zero-or-more matcher and the  which creates a character set. ---> <cfset strPhrase1 = "?!?!?" /> <cfset strPhrase2 = "[sic]" /> <!--- Now, let's create a Java pattern to find our search phrase. Notice that we are putting the above search phrase into our patterns and using the \Q ... \E escape pattern. Using \Q and \E will match literal values in between even if they contain special regular expression characters. ---> <cfset objPattern = CreateObject( "java", "java.util.regex.Pattern" ).Compile( "(?i)(\Q#strPhrase1#\E|\Q#strPhrase2#\E)" ) /> <!--- Create a matcher for out pattern that will be able to search the target string for out literal pattern. ---> <cfset objMatcher = objPattern.Matcher( strText ) /> <!--- Keep looping over the matcher until we have run out of matching patterns. ---> <cfloop condition="objMatcher.Find()"> <p> Found: #objMatcher.Group()#<br /> Found At: #objMatcher.Start()# </p> </cfloop>
If you look at the two search phrases we are searching on:
... you will see that both of these phrases contain special regex characters, the ?,[, and ]. Normally, if we took these strings and just dynamically included them into a regular expression search, we would get very unexpected results. However, since we wrapped both of these phrases in \Q and \E within our pattern, running the above code, we get the following output:
Found At: 109
Found At: 156
Found At: 199
Notice that our phrases were matches as literals, not as "patterns" (they're still patterns, but you know what I mean).
Now, this doesn't put us 100% in the clear; we don't have to worry about 99% of the regular expression characters being in our string, since they are being matched as literals, but will still need to be careful of one: \E. The regular expression will match the "quote" starting at the \Q and ending with the \E. If someone entered a search phrase that has \E in it, then our regular expression will be malformed, having two \E instances and only one \Q instance. I have tried to find a way to escape the \E, but nothing I did seemed to work. Therefore, the one step we might have to take is to make sure the user doesn't enter \E in their search criteria. This is a little irritating, but heck, it's 100% better than having to worry about the entire set of special regular expression characters.
Want to use code from this post? Check out the license.