Ask Ben: Adding A Query String Pair Value To Existing HTML Using ColdFusion (Flagrant Badassery Version)

Posted April 17, 2007 at 8:02 AM

Tags: ColdFusion, Ask Ben

I have a challenge for you. I would like to find every link in a block of html and add a name/value pair to the end of the query string. For example: It might find a link <a href="www.google.com">google</a> I want it to change the link to <a href="www.google.com?SID=123">google</a> We need to keep in mind that some links might already have a '?' and we need to use '&' while others will not have the '?'.

Steve of Flagrant Badassery accepted my challenge to modify the regular expression from Version One to handle all the use cases described (and discovered) in the problem domain. As I expected, he did a fantastic job. After picking apart his regular expression I can understand how it works:

 Launch code in new window » Download code as text file »

  • (<a\s[^>]*?href\s*=\s*""[^?#""]*)\??(?![^#""]*?\bsource=)

At first, I couldn't understand how it handled Hash signs in the URL as it never seemed to replace them back in. But then, I realized that he handled them by not handling them at all. He quite rightly stopped matching the URL at the point it contained a hash sign. The reason for this is that we simply don't need to know what the anchor value is, so why bother even gathering it (especially since it might not even be present). Quite clever!

He also doesn't match any of the query string that is already in the URL. He simply checks to make sure that the URL value "source" is not already present (via a negative look ahead). He then adds our name-value pair to the beginning of the query string. And, just as he did with the Hash sign above, since we are not altering the existing query string, no need to match it. Again, very clever! I never think to NOT match things :)

Here is Steve's bad-ass regular expression solution applied to the example:

 Launch code in new window » Download code as text file »

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>
  •  
  • <!--- Get the page context. --->
  • <cfset objPageContext = GetPageContext() />
  •  
  • <!--- Get the page buffer. --->
  • <cfset objBuffer = objPageContext.GetOut().GetBuffer() />
  •  
  • <!---
  • Get the content buffer string. This will give us everything
  • that has NOT yet been flushed to the browser. This is just
  • how I am doing it for this demo and is NOT the only way to
  • perform this task. Since this page is small, (and is being
  • tested), we can safely assume that the content has not yet
  • been flushed to the client.
  • --->
  • <cfset strContent = objBuffer.ToString() />
  •  
  • <!---
  • Steve of Flagrand Badassery has taken my challenge to modify
  • the regular expression in order to handle this replace in
  • one swoop rather than using the Java Pattern / Matcher. Here
  • is my attempt to break down his regular expression:
  •  
  • (?i)
  • -- Case insensitive (I added this to use the Java regex
  • -- replace rather than REReplaceNoCase()).
  •  
  • -- First Group:
  • (
  • <a\s[^>]*?
  • -- Only match the Anchor tag followed by a space
  • -- followed by a lazy match of non-">" characters.
  • -- The lazy nature of this regular expression will
  • -- try to match the next token (href) when possible.
  •  
  • href\s*=\s*""[^?##""]*
  • -- The href attribute with possible spaces around
  • -- the equals sign (nice call! I always forget
  • -- that). Then, quotes followed by any characters
  • -- not include ?, #, or ".
  • )
  •  
  • \??
  • -- Matches the literal "?" zero or one times (an
  • -- optional characters in the URL.
  •  
  • -- Negative look ahead:
  • (?!
  • [^##""]*?\b
  • -- A lazy match for any character not including
  • -- # and " followed by a word boundry.
  •  
  • source=
  • -- The URL param that we DONT want to add if it
  • -- already exists (hence the negative look ahead
  • -- that we are currently in).
  • )
  • --->
  • <cfset strContent = strContent.ReplaceAll(
  • "(?i)(<a\s[^>]*?href\s*=\s*""[^?##""]*)\??(?![^##""]*?\bsource=)",
  • "$1?source=bennadel.com&"
  • ) />
  •  
  • <!--- Clear the existing content buffer. --->
  • <cfset objPageContext.GetOut().ClearBuffer() />
  •  
  • <!--- Output the updated HTML. --->
  • <cfset WriteOutput( strContent ) />

Running that indeed gives us the desired output. In the following, you will notice that it does add an additional "&" to the URL. This might not be considered the cleanest, but it will in no way cause any harm and I am absolute content with this solution:

 Launch code in new window » Download code as text file »

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/?source=bennadel.com&" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?source=bennadel.com&q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?source=bennadel.com&q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>

Nicely done. Also, as one final note, his solution is WAY smaller than mine and will not match the LINK tag (which mine shamefully would). This just goes to demonstrate how amazingly powerful regular expressions are once you fully understand how they can be applied and you can see where they can be applied. Looking at the regular expression above, I see where I really fell short was not in understanding the regular expresssion - I see how it works. Where I fell short was that I simply didn't see how simple it could be if I didn't bother to match the extraneous parts of the URL. I hope that that sort of skill just comes with time and experience.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Print Page





Reader Comments

Apr 17, 2007 at 10:14 AM // reply »
164 Comments

Regex skillz0rz is one of the lesser-known side-effects of following the great hawk (along with acute eyesight and fire breath). Unfortunately, none of these work very well with the ladies.


Apr 17, 2007 at 10:41 AM // reply »
6,516 Comments

Ha ha ha ha :)


Apr 17, 2007 at 1:18 PM // reply »
3 Comments

I feel this is highly appropriate:

http://xkcd.com/c208.html

:)


Apr 17, 2007 at 1:25 PM // reply »
6,516 Comments

Ha ha, a classic. Steve can definitely save the day.


Apr 17, 2007 at 3:31 PM // reply »
43 Comments

Quick. Somebody send Steve the t-shirt: http://xkcd.com/store/


Apr 17, 2007 at 3:35 PM // reply »
6,516 Comments

Awesome. I didn't know that site had a store. Steve, email me your address, you're getting a t-Shirt ;)


Apr 17, 2007 at 3:37 PM // reply »
43 Comments

Nah. Steve, just post your address here. You'll get lots of cool stuff. Really... :-)


Apr 17, 2007 at 5:49 PM // reply »
40 Comments

Couldn't this also be done client-side? jquery-yo?


Apr 17, 2007 at 8:30 PM // reply »
164 Comments

@Jim Curran:

Heheh! That strip is a classic. :)

@Ben Nadel:

Thanks dude, but don't worry about getting me any free shiznit. (I just might have to buy that No Velociraptors shirt for myself though.)

@Rob Wilkerson:

123 Sesame Street NW


Apr 17, 2007 at 8:41 PM // reply »
164 Comments

One more case where it would make sense to avoid modifying anything is URLs which simply point to a page anchor (e.g., href="#top"). That's easy to do by adding "(?!#)" immediately after the opening quote character for the href attribute in the regex. So, with all the ColdFusion-style escapings, etc., the search pattern would become "(?i)(< a\s[^>]*?href\s*=\s*""(?!##)[^?##""]*)\??(?![^##""]*?\bsource=)" (remove the space between "<" and "a", which was added to avoid anti-spam measures).


Apr 17, 2007 at 8:53 PM // reply »
164 Comments

@Glen Lipka:

It wouldn't make a lot of sense to do it the same way client-side, unless you were updating hrefs within a block of source code that you subsequently insert into the page using document.write or innerHTML. To pull this off on the client side, I'd imagine you'd do something like the following (which doesn't use any particular JavaScript library):

-----------------------------
(function(){
var source = encodeURIComponent("bennadel.com");
var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++) {
<em style="color:green">// URLs which simply point to an anchor within the page shouldn't be modified.
// However, element.href returns an absolute URL regardless of the actual source code,
// so we check if the href contains "#", and that the href and the current page are
// the same after removing any anchors from each. If both conditions are true, the
// browser won't request anything from the server when following the link, so we don't
// want to mess with that by adding a new query key.
if (!(
links[i].href.indexOf("#") > -1 &&
links[i].href.replace(/#.*/, "").toLowerCase() == location.href.replace(/#.*/, "").toLowerCase()
)) {
links[i].href = links[i].href.replace(/^([^?#]*)(\??)(?![^#]*?\bsource=)/, function($0, $1, $2) {
<em style="color:green">// Only include "&" at the end of the replacement string if the URL contained a query
return $1 + "?source=" + source + ($2 == "" ? "" : "&");
});
}
}
})();
-----------------------------

That does not modify any URLs which point to anchors on the current page, but it handles it differently than the ColdFusion/regex-only version because in JavaScript anchorElement.href always returns an absolute URL. Additionally, the above code avoids adding any unnecessary ampersands within URL queries by using a function to determine the replacement string.

Feel free to tighten that up using jQuery, although I'm not sure how useful this code really is.


Apr 17, 2007 at 8:59 PM // reply »
164 Comments

Crap, the code didn't come out very well. Note that in addition to the indentation problem, the two instances of "<em style="color:green">" were not meant to show (their closing tags were stripped out, however).


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 22, 2009 at 4:30 AM
jQuery Live() Method And Event Bubbling
dasegtezr ... read »
Nov 22, 2009 at 4:03 AM
jQuery Live() Method And Event Bubbling
C_fieri ... read »
Nov 22, 2009 at 1:56 AM
Learning ColdFusion 9: Using CFQuery In CFScript Can Enable SQL Injection Attacks
Why adobe would give you script equivalent of cfquery is beyond me. I love cfquery tag because it helps me wriite clean sql, and get away from the horrible jdbc queries If I wanted to write javali ... read »
Nov 22, 2009 at 1:45 AM
Streaming Text Using ColdFusion's CFContent Tag And The Variable Attribute
The reason you would want to do this is to stream. Ack json/xml files to ria clients I used thus technique before because putting json in response stream causes debugging info to come thru As well a ... read »
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »
Nov 21, 2009 at 4:49 PM
Styling The ColdFusion 8 WriteToBrowser CFImage Output
Great work yet again Ben! Whilst I didn't use this whole code, I copied some of your regex code for a similar problem with the lack of an alt attribute and unescaped ampersands in CFIMAGE for Railo 3 ... read »
Nov 21, 2009 at 1:13 PM
My First ColdFusion Builder Extension - Encrypting And Decrypting CFM / CFC Files
@Ben, Because I am pedantic, I just want to make sure that everyone knows there is absolutely no encryption going on. There is only encoding and obfuscation. The cfencode tool only obfuscates your C ... read »