Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with:

Ask Ben: Adding A Query String Pair Value To Existing HTML Using ColdFusion (Alternate Version 2)

By Ben Nadel on

I have a challenge for you. I would like to find every link in a block of html and add a name/value pair to the end of the query string. For example: It might find a link <a href="www.google.com">google</a> I want it to change the link to <a href="www.google.com?SID=123">google</a> We need to keep in mind that some links might already have a '?' and we need to use '&' while others will not have the '?'.

In my first attempt to answer this question, I used some fairly simple Regular Expression replaces. It was, however, brought to my attention, that certain use cases were not considered. One was that the name-value pair we were adding to the URL might already be part of the URL. Another was that some URL might have hash signs (page anchors) and the previous method would inappropriately add the name-value pair after the given hash sign.

In this attempt, I have switched over from simple regular expression replaces to using a Java Pattern / Matcher. This will allows us to match a link and examine each one on an individual basis. This code is certainly more robust, but I think it is straight forward. In my experience, more robust, easy to undestand code is going to be more maintainable than a fairly complicated regular expression that accomplishes the same effect. Of course, that's just me and is definitely limited by my understanding of regular expression (they are still hard for me - Steve would probably know how to do this).

In the following code, notice that one of the links already has the name-value pair "source=bennadel.com". Also notice that two of the links have a hash sign. We can easily handle this by splitting the URL based on the hash sign and then treating the base URL as if it never had a hash sign at all.

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>
  •  
  • <!--- Get the page context. --->
  • <cfset objPageContext = GetPageContext() />
  •  
  • <!--- Get the page buffer. --->
  • <cfset objBuffer = objPageContext.GetOut().GetBuffer() />
  •  
  • <!---
  • Get the content buffer string. This will give us everything
  • that has NOT yet been flushed to the browser. This is just
  • how I am doing it for this demo and is NOT the only way to
  • perform this task. Since this page is small, (and is being
  • tested), we can safely assume that the content has not yet
  • been flushed to the client.
  • --->
  • <cfset strContent = objBuffer.ToString() />
  •  
  • <!---
  • When examing the links, there a couple of case scenarios
  • that we have to consider. Some URLs might have an existing
  • query string. Other might not. Both might have a HASH value
  • (page anchor) and URLs with an existing query string may
  • already have the name-value pair that we are trying to
  • insert.
  •  
  • My original attempt used some fairly small regular
  • expressions. I fear that to try and handle this entirely in
  • regular expressions would become unreadable. Instead, I am
  • going to go the Pattern / Matcher route. This way, we can
  • examine each URL as it comes in.
  •  
  • Let's create a pattern that matches any URL within an HREF.
  • This pattern will require at least one none-quote character
  • in its URL. It will also require quoated URLs.
  • --->
  • <cfset objPattern = CreateObject(
  • "java",
  • "java.util.regex.Pattern"
  • ).Compile(
  • "(?<=href="")([^""]+)(?="")"
  • ) />
  •  
  • <!---
  • Get a pattern matcher based on the content that we have
  • pulled out of our page buffer.
  • --->
  • <cfset objMatcher = objPattern.Matcher( strContent ) />
  •  
  • <!---
  • Create a string buffer into which we will store our update
  • HTML content with updated URLs.
  • --->
  • <cfset objBuffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init() />
  •  
  •  
  • <!--- Loop over all the matched links. --->
  • <cfloop condition="objMatcher.Find()">
  •  
  • <!--- Get the matched URL. --->
  • <cfset strURL = objMatcher.Group() />
  •  
  • <!---
  • First, we want to make sure that we are not duplicating
  • our efforts. Check to see if the URL already contains
  • our name/value pair.
  • --->
  • <cfif NOT FindNoCase( "source=bennadel.com", strURL )>
  •  
  • <!---
  • Split the URL on the hash sign. Even if there is no
  • hash sign, this should result in an array with at
  • least ONE index (the pre-hash value).
  • --->
  • <cfset arrUrlParts = strUrl.Split( "##" ) />
  •  
  • <!---
  • Save the first part (possibly the only part) back
  • into the URL value. Then we can deal with that on
  • its own and add any hash value back in later.
  • --->
  • <cfset strURL = arrUrlParts[ 1 ] />
  •  
  • <!---
  • Check to see if the URL contains an existing
  • query string.
  • --->
  • <cfif Find( "?", strURL )>
  •  
  • <!---
  • Since there is already a query string, we can
  • append ours to the query sting values.
  • --->
  • <cfset strURL = (strURL & "&source=bennadel.com") />
  •  
  • <cfelse>
  •  
  • <!---
  • Since there is no query string yet, we can
  • create one with our name-value pair as its
  • only value.
  • --->
  • <cfset strURL = (strURL & "?source=bennadel.com") />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Now that we have altered are base URL in the most
  • appropriate way, let's see if we had a hash value
  • to add back in. This will only be the case if we
  • had a second parts index that has a value.
  • --->
  • <cfif (ArrayLen( arrUrlParts ) GT 1)>
  •  
  • <!---
  • Append the hash value to our new URL. When
  • doing this, be sure to add the Hash sign back
  • in. This was stripped out during our Split()
  • method call.
  • --->
  • <cfset strURL = (
  • strURL &
  • "##" &
  • arrUrlParts[ 2 ]
  • ) />
  •  
  • </cfif>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • ASSERT: At this point, we have updated the strURL value
  • or we have left it alone. Either way, we are ready to
  • add it back into the string buffer. When doing this, be
  • sure to escape any group references and character
  • escapes that might exist in the string.
  • --->
  • <cfset objMatcher.AppendReplacement(
  • objBuffer,
  • strURL.ReplaceAll(
  • "([\\\$]{1})",
  • "\\$1"
  • )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Now that we have matched all the URLs, add what ever
  • content remains back into the results buffer.
  • --->
  • <cfset objMatcher.AppendTail( objBuffer ) />
  •  
  • <!--- Clear the existing content buffer. --->
  • <cfset objPageContext.GetOut().ClearBuffer() />
  •  
  • <!---
  • Output the updated HTML. When doing this, we have to
  • take our buffer and compile it down to a string.
  • --->
  • <cfset WriteOutput( objBuffer.ToString() ) />

Now, as Shuns raised on the previous example, this will also alter the URL of the HREF used in the LINK tag used to link external style sheets. This can worked around by actually grabbing the tag as part of the regular expression (or forcing the tag to be an anchor). Unforutnately, Shuns didn't raise this until I was practically done with this second attempt and I am far too lazy to go modify my code. As a trade off, however, modifying that URL is really not too much of a big deal. Afterall, it will still return the appropriate style sheet as we are just appending a query string value, not changing the base URL in any way.

This gives us the following output:

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/?source=bennadel.com" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature&source=bennadel.com#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  •  
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette&source=bennadel.com#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>

Notice that everything went quite swimmingly. We did not duplicate our name-value pair in the first URL. Nor did we add any name-value pairs in an inappropriate place.




Reader Comments

"In my experience, more robust, easy to understand code is going to be more maintainable than a fairly complicated regular expression that accomplishes the same effect."

Agreed.

However, you also said this in your earlier post:

"Steve, if you are game and perhaps you can alter the first attempt (this one) and alter the regular expression to handle the other use cases - I demand satisfaction (throwing down the gauntlet)."

:-) Since I'm always down for a regex challenge, here's how you can do this with a single regex (let me know if I'm forgetting any of the cases that need to be accounted for or am otherwise messing something up):

<cfset content = reReplaceNoCase(content, '(< a\s[^>]*?href\s*=\s*"[^?##"]*)\??(?![^##"]*?\bsource=)', "\1?source=bennadel.com&", "all") />

(Remove the space between "<" and "a"... I added it to get around this blog's restricted HTML elements rule.)

That handles the following cases:

- Works with relative and absolute URLs, containing or not containing URL queries and/or fragments (page anchors).
- Does not modify URLs which already contain a "source" key in their query.
- Does not modify URLs contained within the href attributes of HTML elements other than anchors.

One issue is that it adds an unnecessary ampersand at the end of the URL query for URLs which did not already include a query. However, since this was more for the fun of solving the problem with a single regex than keeping URLs perfectly clean (as long as they still work identically), I can live with it. You could always run something like:

<cfset content = reReplace(content, '&(?="|##(?!x?[a-f\d]+;))', "", "all") />

...afterwards to pretty safely remove only the superfluous ampersands which were added within the HTML.

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.