Using POSIX Character Classes In Java Regular Expressions With ColdFusion
Posted February 6, 2009 at 3:31 PM by Ben Nadel
When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:
- [:digit:]
- [:alnum:]
- [:punct:]
When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:
- \p{Digit}
- \p{Alnum}
- \p{Punct}
I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:
- <!--- Save some sample text. --->
- <cfsavecontent variable="strSample">
-
- "You can't just pick and choose which laws to follow. Sure
- I'd like to tape a baseball game without the express written
- consent of major league baseball, but that's just not the
- way it works." - Hank Hill
-
- </cfsavecontent>
-
-
- <!--- Replace graphical characters. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "\p{Graph}+" ),
- JavaCast( "string", "X" )
- )#
-
- <br />
- <br />
-
- <!--- Replace the punctuation. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "\p{Punct}+" ),
- JavaCast( "string", "_" )
- )#
-
- <br />
- <br />
-
- <!--- Replace all punctuation except apostrophes. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "[\p{Punct}&&[^']]+" ),
- JavaCast( "string", "_" )
- )#
When we run this code, we get the following output:
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill
_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill
The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!
Reader Comments
Shoot, since nobody else has commented...
How do you suggest I remove ' ' from a string without removing actual spaces, ' '?
This does not seem to work...
#replace(qSelect.location1, " ", "", "ALL")#
Thank you!!!



