Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at Scotch On The Rocks (SOTR) 2011 (Edinburgh) with:

Ask Ben: Adding A Query String Pair Value To Existing HTML Using ColdFusion (Flagrant Badassery Version)

By Ben Nadel on

I have a challenge for you. I would like to find every link in a block of html and add a name/value pair to the end of the query string. For example: It might find a link <a href="www.google.com">google</a> I want it to change the link to <a href="www.google.com?SID=123">google</a> We need to keep in mind that some links might already have a '?' and we need to use '&' while others will not have the '?'.

Steve of Flagrant Badassery accepted my challenge to modify the regular expression from Version One to handle all the use cases described (and discovered) in the problem domain. As I expected, he did a fantastic job. After picking apart his regular expression I can understand how it works:

  • (<a\s[^>]*?href\s*=\s*""[^?#""]*)\??(?![^#""]*?\bsource=)

At first, I couldn't understand how it handled Hash signs in the URL as it never seemed to replace them back in. But then, I realized that he handled them by not handling them at all. He quite rightly stopped matching the URL at the point it contained a hash sign. The reason for this is that we simply don't need to know what the anchor value is, so why bother even gathering it (especially since it might not even be present). Quite clever!

He also doesn't match any of the query string that is already in the URL. He simply checks to make sure that the URL value "source" is not already present (via a negative look ahead). He then adds our name-value pair to the beginning of the query string. And, just as he did with the Hash sign above, since we are not altering the existing query string, no need to match it. Again, very clever! I never think to NOT match things :)

Here is Steve's bad-ass regular expression solution applied to the example:

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>
  •  
  • <!--- Get the page context. --->
  • <cfset objPageContext = GetPageContext() />
  •  
  • <!--- Get the page buffer. --->
  • <cfset objBuffer = objPageContext.GetOut().GetBuffer() />
  •  
  • <!---
  • Get the content buffer string. This will give us everything
  • that has NOT yet been flushed to the browser. This is just
  • how I am doing it for this demo and is NOT the only way to
  • perform this task. Since this page is small, (and is being
  • tested), we can safely assume that the content has not yet
  • been flushed to the client.
  • --->
  • <cfset strContent = objBuffer.ToString() />
  •  
  • <!---
  • Steve of Flagrand Badassery has taken my challenge to modify
  • the regular expression in order to handle this replace in
  • one swoop rather than using the Java Pattern / Matcher. Here
  • is my attempt to break down his regular expression:
  •  
  • (?i)
  • -- Case insensitive (I added this to use the Java regex
  • -- replace rather than REReplaceNoCase()).
  •  
  • -- First Group:
  • (
  • <a\s[^>]*?
  • -- Only match the Anchor tag followed by a space
  • -- followed by a lazy match of non-">" characters.
  • -- The lazy nature of this regular expression will
  • -- try to match the next token (href) when possible.
  •  
  • href\s*=\s*""[^?##""]*
  • -- The href attribute with possible spaces around
  • -- the equals sign (nice call! I always forget
  • -- that). Then, quotes followed by any characters
  • -- not include ?, #, or ".
  • )
  •  
  • \??
  • -- Matches the literal "?" zero or one times (an
  • -- optional characters in the URL.
  •  
  • -- Negative look ahead:
  • (?!
  • [^##""]*?\b
  • -- A lazy match for any character not including
  • -- # and " followed by a word boundry.
  •  
  • source=
  • -- The URL param that we DONT want to add if it
  • -- already exists (hence the negative look ahead
  • -- that we are currently in).
  • )
  • --->
  • <cfset strContent = strContent.ReplaceAll(
  • "(?i)(<a\s[^>]*?href\s*=\s*""[^?##""]*)\??(?![^##""]*?\bsource=)",
  • "$1?source=bennadel.com&"
  • ) />
  •  
  • <!--- Clear the existing content buffer. --->
  • <cfset objPageContext.GetOut().ClearBuffer() />
  •  
  • <!--- Output the updated HTML. --->
  • <cfset WriteOutput( strContent ) />

Running that indeed gives us the desired output. In the following, you will notice that it does add an additional "&" to the URL. This might not be considered the cleanest, but it will in no way cause any harm and I am absolute content with this solution:

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/?source=bennadel.com&" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?source=bennadel.com&q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?source=bennadel.com&q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>

Nicely done. Also, as one final note, his solution is WAY smaller than mine and will not match the LINK tag (which mine shamefully would). This just goes to demonstrate how amazingly powerful regular expressions are once you fully understand how they can be applied and you can see where they can be applied. Looking at the regular expression above, I see where I really fell short was not in understanding the regular expresssion - I see how it works. Where I fell short was that I simply didn't see how simple it could be if I didn't bother to match the extraneous parts of the URL. I hope that that sort of skill just comes with time and experience.




Reader Comments

Regex skillz0rz is one of the lesser-known side-effects of following the great hawk (along with acute eyesight and fire breath). Unfortunately, none of these work very well with the ladies.

Reply to this Comment

Awesome. I didn't know that site had a store. Steve, email me your address, you're getting a t-Shirt ;)

Reply to this Comment

@Jim Curran:

Heheh! That strip is a classic. :)

@Ben Nadel:

Thanks dude, but don't worry about getting me any free shiznit. (I just might have to buy that No Velociraptors shirt for myself though.)

@Rob Wilkerson:

123 Sesame Street NW

Reply to this Comment

One more case where it would make sense to avoid modifying anything is URLs which simply point to a page anchor (e.g., href="#top"). That's easy to do by adding "(?!#)" immediately after the opening quote character for the href attribute in the regex. So, with all the ColdFusion-style escapings, etc., the search pattern would become "(?i)(< a\s[^>]*?href\s*=\s*""(?!##)[^?##""]*)\??(?![^##""]*?\bsource=)" (remove the space between "<" and "a", which was added to avoid anti-spam measures).

Reply to this Comment

@Glen Lipka:

It wouldn't make a lot of sense to do it the same way client-side, unless you were updating hrefs within a block of source code that you subsequently insert into the page using document.write or innerHTML. To pull this off on the client side, I'd imagine you'd do something like the following (which doesn't use any particular JavaScript library):

-----------------------------
(function(){
var source = encodeURIComponent("bennadel.com");
var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++) {
<em style="color:green">// URLs which simply point to an anchor within the page shouldn't be modified.
// However, element.href returns an absolute URL regardless of the actual source code,
// so we check if the href contains "#", and that the href and the current page are
// the same after removing any anchors from each. If both conditions are true, the
// browser won't request anything from the server when following the link, so we don't
// want to mess with that by adding a new query key.
if (!(
links[i].href.indexOf("#") > -1 &&
links[i].href.replace(/#.*/, "").toLowerCase() == location.href.replace(/#.*/, "").toLowerCase()
)) {
links[i].href = links[i].href.replace(/^([^?#]*)(\??)(?![^#]*?\bsource=)/, function($0, $1, $2) {
<em style="color:green">// Only include "&" at the end of the replacement string if the URL contained a query
return $1 + "?source=" + source + ($2 == "" ? "" : "&");
});
}
}
})();
-----------------------------

That does not modify any URLs which point to anchors on the current page, but it handles it differently than the ColdFusion/regex-only version because in JavaScript anchorElement.href always returns an absolute URL. Additionally, the above code avoids adding any unnecessary ampersands within URL queries by using a function to determine the replacement string.

Feel free to tighten that up using jQuery, although I'm not sure how useful this code really is.

Reply to this Comment

Crap, the code didn't come out very well. Note that in addition to the indentation problem, the two instances of "<em style="color:green">" were not meant to show (their closing tags were stripped out, however).

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.