Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: RichardCooper
Ben Nadel at Scotch On The Rock (SOTR) 2010 (London) with: RichardCooper ( @seopo )

Using POSIX Character Classes In Java Regular Expressions With ColdFusion

By on
Tags:

When I first started to learn regular expressions in ColdFusion, I used things called POSIX Character Classes. These were pre-defined groups of characters that looked like:

  • [:digit:]
  • [:alnum:]
  • [:punct:]

When I started working in Java Regular Expressions, I could no longer use those characters classes. Or rather, I couldn't use them with the same notation - POSIX character classes in Java regular expressions have a different notation:

  • \p{Digit}
  • \p{Alnum}
  • \p{Punct}

I was just looking up some regular expression stuff when I saw these POSIX character classes again. I've never actually used them in Java, so I figured I would take a moment to try them out:

<!--- Save some sample text. --->
<cfsavecontent variable="strSample">

	"You can't just pick and choose which laws to follow. Sure
	I'd like to tape a baseball game without the express written
	consent of major league baseball, but that's just not the
	way it works." - Hank Hill

</cfsavecontent>


<!--- Replace graphical characters. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Graph}+" ),
	JavaCast( "string", "X" )
	)#

<br />
<br />

<!--- Replace the punctuation. --->
#strSample.ReplaceAll(
	JavaCast( "string", "\p{Punct}+" ),
	JavaCast( "string", "_" )
	)#

<br />
<br />

<!--- Replace all punctuation except apostrophes. --->
#strSample.ReplaceAll(
	JavaCast( "string", "[\p{Punct}&&[^']]+" ),
	JavaCast( "string", "_" )
	)#

When we run this code, we get the following output:

X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X

_You can_t just pick and choose which laws to follow_ Sure I_d like to tape a baseball game without the express written consent of major league baseball_ but that_s just not the way it works_ _ Hank Hill

_You can't just pick and choose which laws to follow_ Sure I'd like to tape a baseball game without the express written consent of major league baseball_ but that's just not the way it works_ _ Hank Hill

The casing of the character class in POSIX is important. \p{Digit} works fine, but \p{DIGIT} will throw an error. All of the POSIX classes can be replaced with shorter, more standard character classes so I don't really see much of a need for these; but, the one I see as having some value is the Puntuation character class - there's just too many of those darn characters to type out!

Want to use code from this post? Check out the license.

Reader Comments

3 Comments

Shoot, since nobody else has commented...

How do you suggest I remove '&nbsp;' from a string without removing actual spaces, ' '?

This does not seem to work...
#replace(qSelect.location1, "&nbsp;", "", "ALL")#

Thank you!!!

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel