Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Rob Dudley
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: Rob Dudley@robdudley )

Using POSIX Character Classes In Java Regular Expressions With ColdFusion

By on
Tags:

When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:

  • [:digit:]
  • [:alnum:]
  • [:punct:]

When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:

  • \p{Digit}
  • \p{Alnum}
  • \p{Punct}

I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:

<!--- Save some sample text. --->
<cfsavecontent variable="strSample">

	"You can't just pick and choose which laws to follow. Sure
	I'd like to tape a baseball game without the express written
	consent of major league baseball, but that's just not the
	way it works." - Hank Hill

</cfsavecontent>


<!--- Replace graphical characters. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Graph}+" ),
	JavaCast( "string", "X" )
	)#

<br />
<br />

<!--- Replace the punctuation. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Punct}+" ),
	JavaCast( "string", "_" )
	)#

<br />
<br />

<!--- Replace all punctuation except apostrophes. --->
#strSample.ReplaceAll(
	JavaCast( "string", "[\p{Punct}&&[^']]+" ),
	JavaCast( "string", "_" )
	)#

When we run this code, we get the following output:

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill

_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill

The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!

Want to use code from this post? Check out the license.

Reader Comments

3 Comments

Shoot, since nobody else has commented...

How do you suggest I remove '&nbsp;' from a string without removing actual spaces, ' '?

This does not seem to work...
#replace(qSelect.location1, "&nbsp;", "", "ALL")#

Thank you!!!