Regular Expressions, Extraordinary Power

0 of 0
  1. Ben Nadel

    Ben Nadel in the mirror.

    A little bit about me...

    • Chief Software Engineer, Epicenter
    • Author of The Blog of Ben Nadel
      www.bennadel.com
    • Adobe Community Profession
    • Adobe Certified ColdFusion Developer
    • Co-Manager New York CFUG
    • ColdFusion, XHTML, CSS, jQuery


  2. First Things First...

    Regular Expressions Are

    Awesome!

  3. What Are Regular Expressions?

    • A way of describing patterns in text

  4. What Can We Do With Regular Expressions?

    • Gather text
    • Replace / Transform text
    • Search / Validate text
  5. Where Can We Use Regular Expressions?

    Everywhere!

  6. ColdFusion Gives Us Three Options

    • Native ColdFusion RegEx Engine
      • reFind(), reReplace(), reMatch(), CFParam, CFProperty
    • Java RegEx Engine
      • createObject( "java" )
    • .NET RegEx Engine
      • createObject( "dotnet" )
  7. Not All Engines Are Created Equal

    • Each engine is a "flavor" of Regular Expressions
    • Javascript « ColdFusion « Java
  8. Before We Get Technical

    Regular Expressions Are

    NOT

    Meant To Be Read

  9. Basic Regular Expression Components

    • Character Literals
    • Special Characters / Metacharacters
    • Character Classes
    • Short-Hand Classes
    • Non-printable Characters and Anchors
    • Quantifiers
    • Alternation And Grouping
  10. Character Literals

    • Most characters match themselves
    • A matches "A"
    • B matches "B"
    • ColdFusion matches "ColdFusion"
    • Ooops, I did it again! matches "Ooops, I did it again!"
  11. Character Literal Examples

    ben

    matches

    I like watching benevolent Ben benchpress.

  12. Character Literal Examples

    word.

    matches

    I like cools words. Zoftig is a cool word.

  13. Special Characters / Metacharacters

    • Most characters are literals
    • About ~13 are special
    • [   \   ^   $   .   |   ?   *   +   {   }   (   )
    • These can be escaped with \
  14. Special Character Examples

    \$9\.95

    matches

    That burrito costs $9.95. Delicious!

  15. Special Character Examples

    c:\\app\\log\.txt

    matches

    The log file is located at c:\app\log.txt.

  16. Important Note On Escaping "\"

    • Some languages see "\" as a special string character
    • ColdFusion does NOT
      • "\$12\.95"
      • "C:\\ColdFusion9\\"
    • Javascript DOES
      • "\\$12\\.95"
      • "C:\\\\ColdFusion9\\\\"
  17. Character Classes / Sets

    • [ ... ] defines a set of characters
      • [aeiou] - Matches any vowel
    • [^ ... ] defines a negated set of characters
      • [^aeiou] - Matches anything but a vowel
    • Can define character ranges using dash
      • [a-zA-Z] - Matches any letter
      • [^0-9] - Matches anything but a digit
  18. Character Class Examples

    [0-9\-]

    matches

    Give me a call - my number is 917-555-1234.

  19. Character Class Examples

    [^0-9.]

    matches

    $1,234,567.89

  20. Character Class Joins

    • Set Union
      • [a-zA-Z]
    • Set Intersection
      • [a-z&&[d-f]] - Matches d through f
    • Set Subtraction
      • [a-z&&[^d-f]] - Matches a through c, g through z
  21. Short-Hand Classes

    • Match one of several characters
      •  . - Any character except new-line*
      • \w - Word character, [A-Za-z0-9_]
      • \d - Digit character, [0-9]
      • \s - Space character, [ \t\r\n]
    • Do NOT match one of several characters
      • \W - Anything but a word character, [^A-Za-z0-9_]
      • \D - Anything but a digit character, [^0-9]
      • \S - Anything but a space character, [^ \t\r\n]

    * Unless in single-line mode.

  22. Short-Hand Class Examples

    \d\d-\d\d-\d\d\d\d

    matches

    I was born on 09-21-1980 - go Virgos!

  23. Short-Hand Class Examples

    [^\d.]

    matches

    $1,234,567.89

  24. Short-Hand Class Examples

    [\w\W]

    matches

    Johnny 5 is Alive!

  25. POSIX Character Classes

    • Native ColdFusion support
      • [:alpha:]
      • [:upper:]
      • [:lower:]
      • [:digit:]
      • [:alnum:]
      • [:xdigit:]
      • [:blank:]
      • [:space:]
      • [:print:]
      • [:punct:]
      • [:graph:]
      • [:cntrl:]
      • [:word:]
      • [:ascii:]
    • Java support
      • \p{Lower}
      • \p{Upper}
      • \p{ASCII}
      • \p{Alpha}
      • \p{Digit}
      • \p{Alnum}
      • \p{Punct} - !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
      • \p{Graph}
      • \p{Print}
      • \p{Blank}
      • \p{Cntrl}
      • \p{XDigit}
      • \p{Space}
  26. Non-printable Characters And Anchors

    • ^ - Matches beginning of string (or line*)
      • \A - Always matches beginning of string.
    • $ - Matches end of string (or line*)
      • \Z - Always matches end of string.
    • \b - Matches a word-boundary

    * When in multi-line mode.

  27. Anchor Examples

    ^[a-z]

    matches

    are you feeling alright, Lucy?

  28. Anchor Examples

    cfc$

    matches

    /demo/contacts/cfcs/api.cfc

  29. Anchor Examples

    \bCan

    matches

    Can you dance the CanCan?

  30. Quantifiers - How Much To Match?

    • A* - Zero or more.
    • A+ - One or more.
    • A? - Zero or one (ie. optional).
    • A{N} - N matches.
    • A{N,} - N or more matches.
      • A+ == A{1,}
    • A{N,M} - Betweeen N and M matches.
      • A? == A{0,1}
  31. Quantifier Examples

    <\w+>

    matches

    Say it with <em>style</em>!

  32. Quantifier Examples

    [Dd]ogs?

    matches

    Is that a dog? Dogs are cool!

  33. Quantifier Examples

    \d{2,4}

    matches

    If I were born on 03/01/2009, I'd be 2.

  34. Quantifiers Are Greedy!

    • They try to match as much as possible.
    • Sometimes, being lazy (reluctant) is better ... ? ...:
      • A*?
      • A+?
      • A??
      • A{N,M}?
      • ... etc.
  35. Lazy Quantifier Examples

    <.+>

    matches

    Hey there <em>baby cakes</em>!

  36. Lazy Quantifier Examples

    <.+?>

    matches

    Hey there <em>baby cakes</em>!

  37. Alternation And Grouping

    • | - Alternation, ie. this "or" that
    • ( ... ) - Grouping
    • Quantifiers can be applied to groups!!
  38. Alternation And Grouping Examples

    color|colour

    matches

    That color really brings out your eyes.

  39. Alternation And Grouping Examples

    (like|love) you

    matches

    Joanna, it hurts how much I love you.

  40. Alternation And Grouping Examples

    (na){2}

    matches

    Anna is bonkers for bananas!

  41. Grouping And Back References

    • Captures groups create back references
    • Can be used in patterns - \N *
    • Can be used in replace - $N *

    * Depending on the regular expression engine!.

  42. Back Reference Examples

    b(an)\1as

    matches

    I like bananas!

  43. Back Reference Examples

    n(an)a|b(an)\2as

    matches

    My nana loves bananas!

  44. Back Reference Examples

    b(an)\1+as

    matches

    That's just banananananananananas!

  45. Rock On With Your Bad Self!

    You Just Learned

    80%

    Of RegEx Functionality

  46. Time For A Little Practice

    • Imagine that we need to validate an employee ID
    • Ex: HR-20080118-M-1234
    • HR - Department
      • HR - Human Resources
      • SM - Sales & Marketing
      • D - Development
    • 20080118 - Date employee joined company
      • 20080118 == 2008/01/18
    • M - Gender (M or F)
    • 1234 - Auto-incrementing value
  47. Validate: HR-20080118-M-1234

    HR-20080118-M-1234

  48. Validate: HR-20080118-M-1234

    ^HR-20080118-M-1234$

    Make sure we're validating entire input string

  49. Validate: HR-20080118-M-1234

    ^(HR|SM|D)-20080118-M-1234$

    Allow for each known department abbreviation

  50. Validate: HR-20080118-M-1234

    ^(HR|SM|D)-\d{8}-M-1234$

    Allow for 8 digits for the date YYYYMMDD

  51. Validate: HR-20080118-M-1234

    ^(HR|SM|D)-\d{8}-[MF]-1234$

    Allow for M (Male) or F (Female)

  52. Validate: HR-20080118-M-1234

    ^(HR|SM|D)-\d{8}-[MF]-\d+$

    Allow for an auto-incremented value

  53. Validate: HR-20080118-M-1234

    ^(HR|SM|D)-\d{8}-[MF]-\d+$

    That wasn't so bad :)

  54. Time For A Little More Practice

    • Imagine that we need to validate a 10-digit phone number
    • Ex: (212) 555-1234
    • But, we want to be pretty flexible
      • (212) 555.1234
      • (212) 555 1234
      • 212-555-1234
      • 212.555.1234
      • 212 555 1234
      • 2125551234
  55. Validate: (212) 555-1234

    (212) 555-1234

  56. Validate: (212) 555-1234

    ^(212) 555-1234$

    Make sure we're validating entire input string

  57. Validate: (212) 555-1234

    ^([1-9]12) 555-1234$

    Make sure the number can't start with zero (operator)

  58. Validate: (212) 555-1234

    ^([1-9]\d{2}) \d{3}-\d{4}$

    Allow for the remaining 9 digits

  59. Validate: (212) 555-1234

    ^([1-9]\d{2})[ .\-]?\d{3}[ .\-]?\d{4}$

    Allow for optional separators

  60. Validate: (212) 555-1234

    ^\(?[1-9]\d{2}\)?[ .\-]?\d{3}[ .\-]?\d{4}$

    Allow for optional parenthesis

  61. Validate: (212) 555-1234

    ^\(?[1-9]\d{2}\)?[ .\-]?\d{3}[ .\-]?\d{4}$

    Like I said - these are NOT meant to be read!

  62. Verbose Mode - Making Complex Patterns More Awesome

    • Patterns are not fun to read
    • Verbose mode allows white-space and documentation
    • (?x) - Verbose flag

  63. Phone Number Validation In Verbose Mode

    (?x) ## Start pattern with verbose flag.

    ## Match start of string.
    ^

    ## First set of digits.
    \(?
    [1-9]\d{2}
    \)?

    ## Optional separator.
    [ .\-]?

    ## Second set of digits.
    \d{3}

    ## Optional separator.
    [ .\-]?

    ## Third set of digits.
    \d{4}

    ## Match end of string.
    $

  64. Other Flags To Know About

    • (?xims)
    • (?i) - Ignore case
      • reFindNoCase( "abc" ) == reFind( "(?i)abc" )
    • (?m) - Multi-line
      • ^ and $ work per-line
    • (?s) - Single-line
      • Dot (.) matches new-line
  65. Matching vs. Capturing

    Look, But Don't Touch

  66. Look Ahead

    • (?= ... ) - Positive look ahead
    • (?! ... ) - Negative look ahead
    • Zero-length matches
  67. Look Ahead Examples

    Cold(?=Fusion)

    matches

    I love ColdFusion so much!

  68. Look Ahead Examples

    <a(?=[^>]+?href).+?>

    matches

    <a href="#">click here</a> now!

  69. Look Behind

    • (?<= ... ) - Positive look behind
    • (?<! ... ) - Negative look behind
    • Zero-length matches
  70. Look Behind Examples

    (?<=Cold)Fusion

    matches

    I love ColdFusion so much!

  71. Misc. Cool Tips

    • \xNN, \uNNNN - Hexadecimal characters
  72. Thank You For Listening

    • Ben Nadel
    • Blog: http://www.bennadel.com
    • Twitter: @bennadel
    • Email: ben@bennadel.com
    • Ask Ben: http://www.bennadel.com/ask-ben
    • Consulting: http://www.epicenterconsulting.com