Using POSIX Character Classes In Java Regular Expressions With ColdFusion

Posted February 6, 2009 at 3:31 PM

Tags: ColdFusion

When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:

  • [:digit:]
  • [:alnum:]
  • [:punct:]

When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:

  • \p{Digit}
  • \p{Alnum}
  • \p{Punct}

I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:

 Launch code in new window » Download code as text file »

  • <!--- Save some sample text. --->
  • <cfsavecontent variable="strSample">
  •  
  • "You can't just pick and choose which laws to follow. Sure
  • I'd like to tape a baseball game without the express written
  • consent of major league baseball, but that's just not the
  • way it works." - Hank Hill
  •  
  • </cfsavecontent>
  •  
  •  
  • <!--- Replace graphical characters. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "\p{Graph}+" ),
  • JavaCast( "string", "X" )
  • )#
  •  
  • <br />
  • <br />
  •  
  • <!--- Replace the punctuation. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "\p{Punct}+" ),
  • JavaCast( "string", "_" )
  • )#
  •  
  • <br />
  • <br />
  •  
  • <!--- Replace all punctuation except apostrophes. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "[\p{Punct}&&[^']]+" ),
  • JavaCast( "string", "_" )
  • )#

When we run this code, we get the following output:

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill

_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill

The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Other Searches  |  Print Page




Reader Comments

There are no comments posted for this web log entry.


Post Comment  |  Ask Ben

Recent Blog Comments
Mar 21, 2010 at 8:57 PM
The Bourne Ultimatum Starring Matt Damon And Julia Stiles
late to the party, but my observation is this: rewatch carefully for the platonic nature of the relationship between nicki and jason. she never flirts with him. he never comes on to her. they alway ... read »
Mar 21, 2010 at 7:40 PM
Is Simulating User-Input Events With jQuery Ever A Good Idea?
A couple of things. One you embed the initial state of of more-info in the CSS. IMHO, that behavior should be in jQuery: moreInfo.hide(); It shows that the behavior your toggling and closing is mor ... read »
Mar 21, 2010 at 3:59 PM
Exploring ColdFusion Component Runtime Class Properties And Serialization
@Elliott, according to Ben's experiment, serializeJSON() doesn't access the private data by default - it doesn't even access the getHair() method - so trying to clone a Girl.cfc via serializeJSON/des ... read »
Mar 21, 2010 at 3:49 PM
Ask Ben: Javascript String Replace Method
I'm confused a bit by what you are asking, but if had this sentence: The color, red, is in the style statement; style: red;. and wanted to remove all or change all of the commas, colons, and semi-c ... read »
Mar 21, 2010 at 3:13 PM
Ask Ben: Javascript String Replace Method
I am trying to make a java program to count the number of times that these punctuation marks occur in a body of text: , : ; . ! - ' " ? / \ I am using this piece to ferret out the commas: numcommas ... read »
Mar 21, 2010 at 11:13 AM
A New Wrist Pain
@chiropractor suwanee, Spoken like someone trying to sell something. Other than for minor, temporary relief from some back pain, chiropractic treatment is nothing but placebo effect and quackery. ... read »
Mar 21, 2010 at 6:32 AM
ColdFusion CFPOP - My First Look
Apologies... The field name in the db for C. is "BounceCode" It stores the code / message which is returned in the email. Sorry for the confusion. ... read »
Mar 21, 2010 at 6:29 AM
ColdFusion CFPOP - My First Look
@Jose Galdamez, Hi Ben and Jose 1st of all.. big thanks to Jose for his Skype chat a few weeks back. Your time was much appreciated. I have come up with a rather unelegant solution to my problem a ... read »