I have a challenge for you. I would like to find every link in a block of html and add a name/value pair to the end of the query string. For example: It might find a link <a href="www.google.com">google</a> I want it to change the link to <a href="www.google.com?SID=123">google</a> We need to keep in mind that some links might already have a '?' and we need to use '&' while others will not have the '?'.
Steve of Flagrant Badassery accepted my challenge to modify the regular expression from Version One to handle all the use cases described (and discovered) in the problem domain. As I expected, he did a fantastic job. After picking apart his regular expression I can understand how it works:
Launch code in new window » Download code as text file »
At first, I couldn't understand how it handled Hash signs in the URL as it never seemed to replace them back in. But then, I realized that he handled them by not handling them at all. He quite rightly stopped matching the URL at the point it contained a hash sign. The reason for this is that we simply don't need to know what the anchor value is, so why bother even gathering it (especially since it might not even be present). Quite clever!
He also doesn't match any of the query string that is already in the URL. He simply checks to make sure that the URL value "source" is not already present (via a negative look ahead). He then adds our name-value pair to the beginning of the query string. And, just as he did with the Hash sign above, since we are not altering the existing query string, no need to match it. Again, very clever! I never think to NOT match things :)
Here is Steve's bad-ass regular expression solution applied to the example:
Launch code in new window » Download code as text file »
Running that indeed gives us the desired output. In the following, you will notice that it does add an additional "&" to the URL. This might not be considered the cleanest, but it will in no way cause any harm and I am absolute content with this solution:
Launch code in new window » Download code as text file »
Nicely done. Also, as one final note, his solution is WAY smaller than mine and will not match the LINK tag (which mine shamefully would). This just goes to demonstrate how amazingly powerful regular expressions are once you fully understand how they can be applied and you can see where they can be applied. Looking at the regular expression above, I see where I really fell short was not in understanding the regular expresssion - I see how it works. Where I fell short was that I simply didn't see how simple it could be if I didn't bother to match the extraneous parts of the URL. I hope that that sort of skill just comes with time and experience.
Download Code Snippet ZIP File
Comments (12) | Post Comment | Ask Ben | Permalink | Print Page
Lenny And Bo, ColdFusion Programmers (Vol. 7)
Ask Ben: Adding A Query String Pair Value To Existing HTML Using ColdFusion (Alternate Version 2)
Regex skillz0rz is one of the lesser-known side-effects of following the great hawk (along with acute eyesight and fire breath). Unfortunately, none of these work very well with the ladies.
Posted by Steve on Apr 17, 2007 at 10:14 AM
Ha ha ha ha :)
Posted by Ben Nadel on Apr 17, 2007 at 10:41 AM
I feel this is highly appropriate:
:)
Posted by Jim Curran on Apr 17, 2007 at 1:18 PM
Ha ha, a classic. Steve can definitely save the day.
Posted by Ben Nadel on Apr 17, 2007 at 1:25 PM
Quick. Somebody send Steve the t-shirt: http://xkcd.com/store/
Posted by Rob Wilkerson on Apr 17, 2007 at 3:31 PM
Awesome. I didn't know that site had a store. Steve, email me your address, you're getting a t-Shirt ;)
Posted by Ben Nadel on Apr 17, 2007 at 3:35 PM
Nah. Steve, just post your address here. You'll get lots of cool stuff. Really... :-)
Posted by Rob Wilkerson on Apr 17, 2007 at 3:37 PM
Couldn't this also be done client-side? jquery-yo?
Posted by Glen Lipka on Apr 17, 2007 at 5:49 PM
@Jim Curran:
Heheh! That strip is a classic. :)
@Ben Nadel:
Thanks dude, but don't worry about getting me any free shiznit. (I just might have to buy that No Velociraptors shirt for myself though.)
@Rob Wilkerson:
123 Sesame Street NW
Posted by Steve on Apr 17, 2007 at 8:30 PM
One more case where it would make sense to avoid modifying anything is URLs which simply point to a page anchor (e.g., href="#top"). That's easy to do by adding "(?!#)" immediately after the opening quote character for the href attribute in the regex. So, with all the ColdFusion-style escapings, etc., the search pattern would become "(?i)(< a\s[^>]*?href\s*=\s*""(?!##)[^?##""]*)\??(?![^##""]*?\bsource=)" (remove the space between "<" and "a", which was added to avoid anti-spam measures).
Posted by Steve on Apr 17, 2007 at 8:41 PM
@Glen Lipka:
It wouldn't make a lot of sense to do it the same way client-side, unless you were updating hrefs within a block of source code that you subsequently insert into the page using document.write or innerHTML. To pull this off on the client side, I'd imagine you'd do something like the following (which doesn't use any particular JavaScript library):
-----------------------------
(function(){
var source = encodeURIComponent("bennadel.com");
var links = document.getElementsByTagName("a");
for (var i = 0; i < links.length; i++) {
<em style="color:green">// URLs which simply point to an anchor within the page shouldn't be modified.
// However, element.href returns an absolute URL regardless of the actual source code,
// so we check if the href contains "#", and that the href and the current page are
// the same after removing any anchors from each. If both conditions are true, the
// browser won't request anything from the server when following the link, so we don't
// want to mess with that by adding a new query key.
if (!(
links[i].href.indexOf("#") > -1 &&
links[i].href.replace(/#.*/, "").toLowerCase() == location.href.replace(/#.*/, "").toLowerCase()
)) {
links[i].href = links[i].href.replace(/^([^?#]*)(\??)(?![^#]*?\bsource=)/, function($0, $1, $2) {
<em style="color:green">// Only include "&" at the end of the replacement string if the URL contained a query
return $1 + "?source=" + source + ($2 == "" ? "" : "&");
});
}
}
})();
-----------------------------
That does not modify any URLs which point to anchors on the current page, but it handles it differently than the ColdFusion/regex-only version because in JavaScript anchorElement.href always returns an absolute URL. Additionally, the above code avoids adding any unnecessary ampersands within URL queries by using a function to determine the replacement string.
Feel free to tighten that up using jQuery, although I'm not sure how useful this code really is.
Posted by Steve on Apr 17, 2007 at 8:53 PM
Crap, the code didn't come out very well. Note that in addition to the indentation problem, the two instances of "<em style="color:green">" were not meant to show (their closing tags were stripped out, however).
Posted by Steve on Apr 17, 2007 at 8:59 PM