Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Jeff McDowell and Joel Hill and Jonathan Rowny and Shawn Grigson and Jonathan Dowdle and Matt Vickers and Christian Ready and Asher Snyder and Clark Valberg and Oscar Arevalo and David Bainbridge and Josh Siok
Ben Nadel at cf.Objective() 2014 (Bloomington, MN) with: Jeff McDowell@jeff_s_mcdowell ) , Joel Hill@Jiggidyuo ) , Jonathan Rowny@jrowny ) , Shawn Grigson@shawngrig ) , Jonathan Dowdle@jdowdle ) , Matt Vickers@envex ) , Christian Ready@christianready ) , Asher Snyder@ashyboy ) , Clark Valberg@clarkvalberg ) , Oscar Arevalo@oarevalo ) , David Bainbridge ( @redwhitepine ) , and Josh Siok@siok )

Using POSIX Character Classes In Java Regular Expressions With ColdFusion

By Ben Nadel on
Tags: ColdFusion

When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:

  • [:digit:]
  • [:alnum:]
  • [:punct:]

When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:

  • \p{Digit}
  • \p{Alnum}
  • \p{Punct}

I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:

  • <!--- Save some sample text. --->
  • <cfsavecontent variable="strSample">
  •  
  • "You can't just pick and choose which laws to follow. Sure
  • I'd like to tape a baseball game without the express written
  • consent of major league baseball, but that's just not the
  • way it works." - Hank Hill
  •  
  • </cfsavecontent>
  •  
  •  
  • <!--- Replace graphical characters. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "\p{Graph}+" ),
  • JavaCast( "string", "X" )
  • )#
  •  
  • <br />
  • <br />
  •  
  • <!--- Replace the punctuation. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "\p{Punct}+" ),
  • JavaCast( "string", "_" )
  • )#
  •  
  • <br />
  • <br />
  •  
  • <!--- Replace all punctuation except apostrophes. --->
  • #strSample.ReplaceAll(
  • JavaCast( "string", "[\p{Punct}&&[^']]+" ),
  • JavaCast( "string", "_" )
  • )#

When we run this code, we get the following output:

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill

_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill

The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!


Looking For A New Job?

Ooops, there are no jobs. Post one now for only $29 and own this real estate!

100% of job board revenue is donated to Kiva. Loans that change livesFind out more »

Reader Comments

Shoot, since nobody else has commented...

How do you suggest I remove '&nbsp;' from a string without removing actual spaces, ' '?

This does not seem to work...
#replace(qSelect.location1, "&nbsp;", "", "ALL")#

Thank you!!!