Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at cf.Objective() 2010 (Minneapolis, MN) with: Simon Free and Dan Wilson and Jason Dean
Ben Nadel at cf.Objective() 2010 (Minneapolis, MN) with: Simon Free@simonfree ) , Dan Wilson@DanWilson ) , and Jason Dean@JasonPDean )

Which ASCII Characters Does urlEncodedFormat() Escape In ColdFusion

By Ben Nadel on
Tags: ColdFusion

urlEncodedFormat() is one of those functions that I've been using forever; but, when I stop and think about it, I'm not 100% sure what it actually does. I mean, I know that it prepares a value to be used in a URL; but I don't think I've ever actually read the documentation on it. And, I've definitely never experimented with it. As such, I thought I would do a little "note to self" blog post and see what actually happens when I apply urlEncodedFormat() to individual characters.

This experiment is simple - loop over each character, apply urlEncodedFormat(), and see if the resultant value is different. If so, it means that urlEncodedFormat() encoded the value.

  • <cfscript>
  •  
  • // NOTE: Only going between 32 and 126 because urlEncodedFormat() appears to
  • // encode all control characters as well as anything above 127 (inclusive).
  • for ( i = 32 ; i <= 126 ; i++ ) {
  •  
  • charValue = chr( i );
  • escapedValue = urlEncodedFormat( charValue, "utf-8" );
  •  
  • // If the two values don't match, it means that urlEncodedFormat() is
  • // escapeing the value.
  • if ( compare( charValue, escapedValue ) ) {
  •  
  • writeOutput( "#i# ... #charValue# ... #escapedValue#<br />" );
  •  
  • }
  •  
  • }
  •  
  • </cfscript>

I'm only looping from 32 to 126 because urlEncodedFormat() seems to encode all control characters (most of which are 0-31) and all characters on or above 127. So, for the sake of the demo, I've limited it to the area of the basic ASCII set where things are interesting.

When we run the above code, we get the following output:

32 ... ... %20
33 ... ! ... %21
34 ... " ... %22
35 ... # ... %23
36 ... $ ... %24
37 ... % ... %25
38 ... & ... %26
39 ... ' ... %27
40 ... ( ... %28
41 ... ) ... %29
42 ... * ... %2A
43 ... + ... %2B
44 ... , ... %2C
45 ... - ... %2D
46 ... . ... %2E
47 ... / ... %2F
58 ... : ... %3A
59 ... ; ... %3B
60 ... < ... %3C
61 ... = ... %3D
62 ... > ... %3E
63 ... ? ... %3F
64 ... @ ... %40
91 ... [ ... %5B
92 ... \ ... %5C
93 ... ] ... %5D
94 ... ^ ... %5E
95 ... _ ... %5F
96 ... ` ... %60
123 ... { ... %7B
124 ... | ... %7C
125 ... } ... %7D
126 ... ~ ... %7E

As you can see, urlEncodedFormat() escaped every non-alpha-numeric character. Which is, ironically, exactly what the documentation says:

Generates a URL-encoded string. For example, it replaces spaces with %20, and non-alphanumeric characters with equivalent hexadecimal escape sequences. Passes arbitrary strings within a URL (ColdFusion automatically decodes URL parameters that are passed to a page).

Ok - this all makes sense now. My mental model has been updated.




Reader Comments

@Sean,

Oooh, most excellent suggestion. I actually haven't played around with any of the new encoding methods. I think those are all based on OWASP standards; but, not sure. I'll take a look, thanks!

Out of curiosity I used the above code and swapped out urlEncodedFormat( charValue, "utf-8" ) for encodeForUrl( charValue ) and the results were the same except for char(32). . .

There's an extensive character comparison of old vs. new HTML, XML, URL and JS encoders here:

http://damonmiller.github.io/esapi4cf/tutorials/Encoding.html