Case Insensitive Java Regular Expressions - How Did I Miss That!?!

Posted September 29, 2006 at 12:59 PM by Ben Nadel

Tags: ColdFusion

As you all know, in ColdFusion, when you want to do a find or replace with no case sensitivity, you just append "NoCase" to the method call. We have all used:

  • FindNoCase()
  • REFindNoCase()
  • ReplaceNoCase()
  • REReplaceNoCase()

As you well know (if you follow my blog), I am a huge fan of using the Java String methods for regular expression find and manipulation. The problem with that though is that there is no "NoCase" type methods built into the Java string (except for String::equalsIgnoreCase(), but that doesn't use regular expressions). I thought the only way to work around this was to either put all characters in regular expression (ie [a-zA-Z]), but it turns out there is a flag for performing a case-insensitive search:

(?i)

How did I miss that? I have been doing regular expressions for a long time now and this one has escaped me. Crazy! But, it's so awesome. You just put the flag (?i) in your regular expression and everything to the right of it will be case-insensitive:

  • <!--- Set string value. --->
  • <cfset strText = "Libby is HOT" />
  •  
  • <!--- Check case-sensitive match. --->
  • #strText.Matches( "[a-z ]+" )#
  •  
  • <!--- Check case-INsensitive match. --->
  • #strText.Matches( "(?i)[a-z ]+" )#

The first method outputs NO... the second method outputs YES. How freakin' cool is that. You can even perform case insensitive sub-strings:

  • <!--- Only allow case-INsensitivity on LIBBY and HOT. --->
  • #strText.Matches( "(?i:libby)[a-z ]+(?i:hot)" )#

This also outputs YES.

So anyway, that's pretty freakin' cool! I don't know how I missed that when learning Regular Expressions, but now that I know, it is sooo on.

Oh, and just be careful. This is only for JAVA regular expressions. If you try to use this type of notation in ColdFusion method calls, it will throw an error:

Malformed regular expression "(?i:libby)[a-z ]+(?i:hot)". Reason: Sequence (?:...) not recognized.

... Just another reason I am huge fan of using the underlying Java methods.


You Might Also Be Interested In:



Reader Comments

Sep 30, 2006 at 10:50 AM // reply »
20 Comments

So what exactly are the benefits of using the Java string classes? Is it just a performance issue?


Sep 30, 2006 at 11:16 AM // reply »
11,314 Comments

It can be a performance issue, but it's also a utility issues. As I describe above, the regular expression methods used on the Java string class:

String::Matches()
String::ReplaceFirst()
String::ReplaceAll()

... allow for much more powerful AND faster regular expressions. Those methods can handle all the look behinds (negative and possitive) which I don't think ColdFusion can handle at all. That means that they can do much more than things like:

REFind()
REReplace()

So, it's speed, but it's also flexability.


Feb 2, 2007 at 4:09 PM // reply »
172 Comments

If (?i) and (?i:regex) works, (?-i) might also work, which with some regex libraries turns off case insensitivity for the remainder of the regular expression, e.g.: (?i)libby(?-i)[a-z]

Other things to test, which work with some libraries:

(?s) / (?-s) : Turn on/off dot matches newline.
(?m) / (?-m) : Turn on/off caret and dollar match after and before newlines.

If those work, you should also be able to do, e.g., (?i-sm) to turn on "i" and "m", but turn off "s" for the remainder of the regex.


Feb 2, 2007 at 4:15 PM // reply »
172 Comments

BTW, is there anything that needs to be done beyond what you show in your example code to make the Java methods available? Having extended regex functionality available to me in ColdFusion (particularly lookbehinds) has been a dream.


Feb 2, 2007 at 4:23 PM // reply »
11,314 Comments

Steve,

You know more about regular expressions than I do. I have not tried using (nor did I know about) flags that would turn off previous flags. As far as I know though, CFMX6/7 is running on top of Java 1.4.2 or something, so theoretically, anything that works in that edition of Java will run in ColdFusion (when using the underlying Java methods).

Try searching my site for ReplaceAll() and ReplaceFirst() which are the two main java RegEx string methods that I use:

http://www.bennadel.com/search/replaceall%20replacefirst

You also might want to search for Split(). Also another great Java String method:

http://www.bennadel.com/search/split(

Other than that, I am sure once you figure out the methods calls, you seem to know the nuts and bolts of the regex stuff better than I do, you will do great. You will also find that Java Regular expressions are NICE AND FAST.

Also, you might want to search for pattern / matcher:

http://www.bennadel.com/search/util.regex

Let me know if there is anything I can help with.


Feb 3, 2007 at 12:40 AM // reply »
172 Comments

Thanks for the leads, Ben!


Feb 3, 2007 at 9:17 AM // reply »
11,314 Comments

Trust me, once you go underlying Java string methods... you never go back.


Feb 15, 2009 at 8:00 PM // reply »
1 Comments

I tried to implement this in the following way

if (s1.indexOf((?i)s2) >= 0)

Where s1 and s2 are string variables. Trying to get an if statement that worked if s1 contained s2 anywhere in it and then perform a loop.

I fought with it for awhile and gave up.

Any thought why this did not work (what syntax did I get wrong?).
Thanks,
bg


Feb 15, 2009 at 8:37 PM // reply »
11,314 Comments

@Brian,

I am not sure what you are trying to do? Can you explain a bit more?


Apr 17, 2009 at 8:25 AM // reply »
1 Comments

This is great!!! Thanks for the tip!

Slighly off-topic, using a reg. expression and the replaceAll function, can you replace only whole words, eg. if I wanted to replace all "and" with "xyz", but only if "and" is not part of another word, as in "hand"?


Apr 17, 2009 at 8:27 AM // reply »
11,314 Comments

@Firoz,

Yeah, you just have to use word boundaries: \b

\band\b

... will get "and" only when its a whole word.


Jul 13, 2009 at 10:51 AM // reply »
1 Comments

Thanks. Very useful information.


Feb 18, 2010 at 2:45 PM // reply »
3 Comments

Hi! It doesn't work with accents ? :/
Try: Įguia
I consider this a bug.

Do you know some workaround?


Feb 22, 2010 at 8:57 PM // reply »
11,314 Comments

@Leandro,

What doesn't work with accents? I am not sure what you are trying to do?


Mar 6, 2010 at 11:17 AM // reply »
3 Comments

class DontWorkWithAccent {

public static void main(String[] args) {
String str = "Įguia";
str = str.replaceAll("(?i)[į]", "a");
System.out.println(str);
str = str.replaceAll("(?i)[Į]", "a");
System.out.println(str);
}
}


Mar 8, 2010 at 6:52 PM // reply »
11,314 Comments

@Leandro,

Hmmm, interesting. I know you can use hex-based values in the regular expression. Example:

[^\x7F]

... so I know it can handle any character at some level. I wonder why this is not working as a standard character class.


Jul 21, 2010 at 10:58 PM // reply »
1 Comments

Have a way to make a replaceAll in Java equivalent this maked in PHP (http://bit.ly/bZilOh):

word >a href="word"<word>/word<word word

to

repl >a href="word"<repl>/word<repl repl

note thats only "word"s outside a tag are replaced


Jul 22, 2010 at 9:58 PM // reply »
11,314 Comments

@Celso,

In the link you gave me, it looks like they are using a negative look-ahead to confirm that they are not inside a tag. Java definitely supports the negative look-ahead, and I pretty sure that ColdFusion also supports them (I believe it is the look-behinds that ColdFusion doesn't support).


Jan 17, 2011 at 9:23 AM // reply »
1 Comments

Awesome..!! Thanks for sharing. You made my day.


Jun 10, 2011 at 9:15 AM // reply »
8 Comments

Very very very awesome !!!

I like it!
Seriously, I'm frustrated to don't know that rather!
I always wanted to use regex simplier in CF.
The traditional RE[..]() are too limited compared to java methods so generally I use your component, PaternMatcher.cfc (it's very useful!)!

Thanks for your tip Ben!


Jun 10, 2011 at 9:17 AM // reply »
8 Comments

Did you have make an entry that reassemble all java methods hidden in native CF objects ? As this trick up above.


Nov 4, 2011 at 1:13 PM // reply »
2 Comments

Thanks a bunch! This is great info even though I've never used ColdFusion. :)


Nov 4, 2011 at 1:18 PM // reply »
2 Comments

@Brian,

I know this is really (I mean really) late respons to Brian but... This page was high up on the Google search for "java matcher case insensitive" so I figure others will get here too and maybe wonder about the same thing.

String.indexOf does not use regular expressions so you can't do the magic of (?i) in a parameter to indexOf. So even if you had correct java syntax, it wouldn't work.


Nov 22, 2011 at 5:36 AM // reply »
1 Comments

Freakin' cool!! Thanks


Jun 26, 2012 at 7:20 AM // reply »
1 Comments

@Leandro,

The (?i) is documented as working with US-ASCII characters. If you want to work with Unicode characters then you need to use (?iu):

String str = "Įguia";
str = str.replaceAll("(?iu)[į]", "a");

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#CASE_INSENSITIVE


Jun 26, 2012 at 5:50 PM // reply »
3 Comments

@Keith Starsmeare

Thank you.



Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Jun 20, 2013 at 3:15 AM
A Billion Wicked Thoughts By Ogi Ogas And Sai Gaddam
nice post i love it thanks 4 u :) ... read »
seb
Jun 20, 2013 at 2:32 AM
Working With Inherited Collections In AngularJS
@mike, @ben, The best article about scope and prototypal prototypical inheritance in angularjs is http://stackoverflow.com/questions/14049480/what-are-the-nuances-of-scope-prototypal-prototypical- ... read »
Jun 20, 2013 at 2:17 AM
ColdFusion NumberFormat() Exploration
Nice read thanks Ben, Is there a way to mask a negative number? Long story short in the finance sector when you go 'short' on a stock you want the price to fall this is a good thing because you are ... read »
Jun 20, 2013 at 1:09 AM
The Beauty Of The jQuery Each() Method
my html code : <html> <head> <script type="text/javascript" src="jquery.js"></script> <script type="text/javascript" src="nss.js"> ... read »
Jun 19, 2013 at 11:31 PM
Directive Link, $observe, And $watch Functions Execute Inside An AngularJS Context
@Ben, bunch to learn indeed, but thats fun part : ) ... read »
Jun 19, 2013 at 10:41 PM
Referencing ColdFusion Query Columns In A Loop Using Both Array And Dot Notation
Burdock-roots Are you going fat day by day? You need to be good for your family and make some money too. So we bring for you a best product that helps you to be more energetic every day. You will b ... read »
Jun 19, 2013 at 9:52 PM
Working With Inherited Collections In AngularJS
I recognize the applicability of your solution, and how easy it makes to share data across multiple views or even "submodules" of rather simple application. But it seems to me that it creat ... read »
Jun 19, 2013 at 9:38 PM
Directive Link, $observe, And $watch Functions Execute Inside An AngularJS Context
@Alesei, Glad you like it. Even after working with AngularJS for months, I still get a bunch of unexpected, "$digest is already in progress". So hard to debug sometimes! ... read »
InVision App - Prototyping Made Beautiful With Prototyping Tools