Using POSIX Character Classes In Java Regular Expressions With ColdFusion
Posted February 6, 2009 at 3:31 PM
When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:
- [:digit:]
- [:alnum:]
- [:punct:]
When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:
- \p{Digit}
- \p{Alnum}
- \p{Punct}
I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:
Launch code in new window » Download code as text file »
- <!--- Save some sample text. --->
- <cfsavecontent variable="strSample">
-
- "You can't just pick and choose which laws to follow. Sure
- I'd like to tape a baseball game without the express written
- consent of major league baseball, but that's just not the
- way it works." - Hank Hill
-
- </cfsavecontent>
-
-
- <!--- Replace graphical characters. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "\p{Graph}+" ),
- JavaCast( "string", "X" )
- )#
-
- <br />
- <br />
-
- <!--- Replace the punctuation. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "\p{Punct}+" ),
- JavaCast( "string", "_" )
- )#
-
- <br />
- <br />
-
- <!--- Replace all punctuation except apostrophes. --->
- #strSample.ReplaceAll(
- JavaCast( "string", "[\p{Punct}&&[^']]+" ),
- JavaCast( "string", "_" )
- )#
When we run this code, we get the following output:
X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X
_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill
_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill
The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!
Download Code Snippet ZIP File
Post Comment | Ask Ben | Other Searches | Print Page
Newer Post
ColdFusion Regular Expressions Do Not Support Character Class Intersection Or Subtraction
Older Post
Christian Bale Goes Ballistic On Terminator Salvation Set - A Good Sign!
Reader Comments
There are no comments posted for this web log entry.



