Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

Using POSIX Character Classes In Java Regular Expressions With ColdFusion

By Ben Nadel on
Tags: ColdFusion

When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:

  • [:digit:]
  • [:alnum:]
  • [:punct:]

When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:

  • \p{Digit}
  • \p{Alnum}
  • \p{Punct}

I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:

<!--- Save some sample text. --->
<cfsavecontent variable="strSample">

	"You can't just pick and choose which laws to follow. Sure
	I'd like to tape a baseball game without the express written
	consent of major league baseball, but that's just not the
	way it works." - Hank Hill

</cfsavecontent>


<!--- Replace graphical characters. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Graph}+" ),
	JavaCast( "string", "X" )
	)#

<br />
<br />

<!--- Replace the punctuation. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Punct}+" ),
	JavaCast( "string", "_" )
	)#

<br />
<br />

<!--- Replace all punctuation except apostrophes. --->
#strSample.ReplaceAll(
	JavaCast( "string", "[\p{Punct}&&[^']]+" ),
	JavaCast( "string", "_" )
	)#

When we run this code, we get the following output:

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill

_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill

The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!



Reader Comments

Shoot, since nobody else has commented...

How do you suggest I remove '&nbsp;' from a string without removing actual spaces, ' '?

This does not seem to work...
#replace(qSelect.location1, "&nbsp;", "", "ALL")#

Thank you!!!