Ask Ben: Adding A Query String Pair Value To Existing HTML Using ColdFusion (Alternate Version 2)

Posted April 16, 2007 at 6:35 PM

Tags: ColdFusion, Ask Ben

I have a challenge for you. I would like to find every link in a block of html and add a name/value pair to the end of the query string. For example: It might find a link <a href="www.google.com">google</a> I want it to change the link to <a href="www.google.com?SID=123">google</a> We need to keep in mind that some links might already have a '?' and we need to use '&' while others will not have the '?'.

In my first attempt to answer this question, I used some fairly simple Regular Expression replaces. It was, however, brought to my attention, that certain use cases were not considered. One was that the name-value pair we were adding to the URL might already be part of the URL. Another was that some URL might have hash signs (page anchors) and the previous method would inappropriately add the name-value pair after the given hash sign.

In this attempt, I have switched over from simple regular expression replaces to using a Java Pattern / Matcher. This will allows us to match a link and examine each one on an individual basis. This code is certainly more robust, but I think it is straight forward. In my experience, more robust, easy to undestand code is going to be more maintainable than a fairly complicated regular expression that accomplishes the same effect. Of course, that's just me and is definitely limited by my understanding of regular expression (they are still hard for me - Steve would probably know how to do this).

In the following code, notice that one of the links already has the name-value pair "source=bennadel.com". Also notice that two of the links have a hash sign. We can easily handle this by splitting the URL based on the hash sign and then treating the base URL as if it never had a hash sign at all.

 Launch code in new window » Download code as text file »

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>
  •  
  • <!--- Get the page context. --->
  • <cfset objPageContext = GetPageContext() />
  •  
  • <!--- Get the page buffer. --->
  • <cfset objBuffer = objPageContext.GetOut().GetBuffer() />
  •  
  • <!---
  • Get the content buffer string. This will give us everything
  • that has NOT yet been flushed to the browser. This is just
  • how I am doing it for this demo and is NOT the only way to
  • perform this task. Since this page is small, (and is being
  • tested), we can safely assume that the content has not yet
  • been flushed to the client.
  • --->
  • <cfset strContent = objBuffer.ToString() />
  •  
  • <!---
  • When examing the links, there a couple of case scenarios
  • that we have to consider. Some URLs might have an existing
  • query string. Other might not. Both might have a HASH value
  • (page anchor) and URLs with an existing query string may
  • already have the name-value pair that we are trying to
  • insert.
  •  
  • My original attempt used some fairly small regular
  • expressions. I fear that to try and handle this entirely in
  • regular expressions would become unreadable. Instead, I am
  • going to go the Pattern / Matcher route. This way, we can
  • examine each URL as it comes in.
  •  
  • Let's create a pattern that matches any URL within an HREF.
  • This pattern will require at least one none-quote character
  • in its URL. It will also require quoated URLs.
  • --->
  • <cfset objPattern = CreateObject(
  • "java",
  • "java.util.regex.Pattern"
  • ).Compile(
  • "(?<=href="")([^""]+)(?="")"
  • ) />
  •  
  • <!---
  • Get a pattern matcher based on the content that we have
  • pulled out of our page buffer.
  • --->
  • <cfset objMatcher = objPattern.Matcher( strContent ) />
  •  
  • <!---
  • Create a string buffer into which we will store our update
  • HTML content with updated URLs.
  • --->
  • <cfset objBuffer = CreateObject(
  • "java",
  • "java.lang.StringBuffer"
  • ).Init() />
  •  
  •  
  • <!--- Loop over all the matched links. --->
  • <cfloop condition="objMatcher.Find()">
  •  
  • <!--- Get the matched URL. --->
  • <cfset strURL = objMatcher.Group() />
  •  
  • <!---
  • First, we want to make sure that we are not duplicating
  • our efforts. Check to see if the URL already contains
  • our name/value pair.
  • --->
  • <cfif NOT FindNoCase( "source=bennadel.com", strURL )>
  •  
  • <!---
  • Split the URL on the hash sign. Even if there is no
  • hash sign, this should result in an array with at
  • least ONE index (the pre-hash value).
  • --->
  • <cfset arrUrlParts = strUrl.Split( "##" ) />
  •  
  • <!---
  • Save the first part (possibly the only part) back
  • into the URL value. Then we can deal with that on
  • its own and add any hash value back in later.
  • --->
  • <cfset strURL = arrUrlParts[ 1 ] />
  •  
  • <!---
  • Check to see if the URL contains an existing
  • query string.
  • --->
  • <cfif Find( "?", strURL )>
  •  
  • <!---
  • Since there is already a query string, we can
  • append ours to the query sting values.
  • --->
  • <cfset strURL = (strURL & "&source=bennadel.com") />
  •  
  • <cfelse>
  •  
  • <!---
  • Since there is no query string yet, we can
  • create one with our name-value pair as its
  • only value.
  • --->
  • <cfset strURL = (strURL & "?source=bennadel.com") />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • Now that we have altered are base URL in the most
  • appropriate way, let's see if we had a hash value
  • to add back in. This will only be the case if we
  • had a second parts index that has a value.
  • --->
  • <cfif (ArrayLen( arrUrlParts ) GT 1)>
  •  
  • <!---
  • Append the hash value to our new URL. When
  • doing this, be sure to add the Hash sign back
  • in. This was stripped out during our Split()
  • method call.
  • --->
  • <cfset strURL = (
  • strURL &
  • "##" &
  • arrUrlParts[ 2 ]
  • ) />
  •  
  • </cfif>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • ASSERT: At this point, we have updated the strURL value
  • or we have left it alone. Either way, we are ready to
  • add it back into the string buffer. When doing this, be
  • sure to escape any group references and character
  • escapes that might exist in the string.
  • --->
  • <cfset objMatcher.AppendReplacement(
  • objBuffer,
  • strURL.ReplaceAll(
  • "([\\\$]{1})",
  • "\\$1"
  • )
  • ) />
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • Now that we have matched all the URLs, add what ever
  • content remains back into the results buffer.
  • --->
  • <cfset objMatcher.AppendTail( objBuffer ) />
  •  
  • <!--- Clear the existing content buffer. --->
  • <cfset objPageContext.GetOut().ClearBuffer() />
  •  
  • <!---
  • Output the updated HTML. When doing this, we have to
  • take our buffer and compile it down to a string.
  • --->
  • <cfset WriteOutput( objBuffer.ToString() ) />

Now, as Shuns raised on the previous example, this will also alter the URL of the HREF used in the LINK tag used to link external style sheets. This can worked around by actually grabbing the tag as part of the regular expression (or forcing the tag to be an anchor). Unforutnately, Shuns didn't raise this until I was practically done with this second attempt and I am far too lazy to go modify my code. As a trade off, however, modifying that URL is really not too much of a big deal. Afterall, it will still return the appropriate style sheet as we are just appending a query string value, not changing the base URL in any way.

This gives us the following output:

 Launch code in new window » Download code as text file »

  • <html>
  • <head>
  • <title>Alter URL Demo</title>
  • </head>
  • <body>
  •  
  • <p>
  • Hey man, if you are looking for some good images, you should probably try out the search page on <a href="http://www.searchgalleries.com?source=bennadel.com" target="_blank">Search Galleries</a>. It's pretty darn comprehensive and seems to keep track of all the free galleries that you will ever need. If you want to mess with the URL, its easy; just add a "q" query string value to the search url. The general site search URL is <a href="http://www.searchgalleries.com/search/?source=bennadel.com" target="_blank">http://www.searchgalleries.com/search/</a>. So, then, to add a query value to it, such as "mature", you would simply add the query string "q=mature" to the url: <a href="http://www.searchgalleries.com/search/?q=mature&source=bennadel.com#links" target="_blank">http://www.searchgalleries.com/search/?q=mature</a>. You can even search for more than one value at a given time. So, for instance, if you want to search for mature brunette women, you would put go to the URL:
  •  
  • <a href="http://www.searchgalleries.com/search/?q=mature+brunette&source=bennadel.com#links" target="_blank">http://www.searchgalleries.com/search/?q=mature+brunette</a>. Notice that "mature" and "brunette" are separated by a "+" sign. This is the URL encoded form of a space.
  • </p>
  •  
  • </body>
  • </html>

Notice that everything went quite swimmingly. We did not duplicate our name-value pair in the first URL. Nor did we add any name-value pairs in an inappropriate place.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Apr 16, 2007 at 9:35 PM // reply »
165 Comments

"In my experience, more robust, easy to understand code is going to be more maintainable than a fairly complicated regular expression that accomplishes the same effect."

Agreed.

However, you also said this in your earlier post:

"Steve, if you are game and perhaps you can alter the first attempt (this one) and alter the regular expression to handle the other use cases - I demand satisfaction (throwing down the gauntlet)."

:-) Since I'm always down for a regex challenge, here's how you can do this with a single regex (let me know if I'm forgetting any of the cases that need to be accounted for or am otherwise messing something up):

<cfset content = reReplaceNoCase(content, '(< a\s[^>]*?href\s*=\s*"[^?##"]*)\??(?![^##"]*?\bsource=)', "\1?source=bennadel.com&", "all") />

(Remove the space between "<" and "a"... I added it to get around this blog's restricted HTML elements rule.)

That handles the following cases:

- Works with relative and absolute URLs, containing or not containing URL queries and/or fragments (page anchors).
- Does not modify URLs which already contain a "source" key in their query.
- Does not modify URLs contained within the href attributes of HTML elements other than anchors.

One issue is that it adds an unnecessary ampersand at the end of the URL query for URLs which did not already include a query. However, since this was more for the fun of solving the problem with a single regex than keeping URLs perfectly clean (as long as they still work identically), I can live with it. You could always run something like:

<cfset content = reReplace(content, '&(?="|##(?!x?[a-f\d]+;))', "", "all") />

...afterwards to pretty safely remove only the superfluous ampersands which were added within the HTML.


Apr 17, 2007 at 8:07 AM // reply »
7,572 Comments

@Steve,

Brilliant! Well played my friend, well played:

http://www.bennadel.com/index.cfm?dax=blog:642.view


Post Comment  |  Ask Ben

Recent Blog Comments
Mar 19, 2010 at 7:26 PM
MySQL 3/4 - com.mysql.jdbc.Driver And allowMultiQueries=true
Thank you very much for this post. Adding allowMultiQueries="true" in context.xml didn't help until I added it to url as allowMultiQueries=true Good idea is to use prepared statements and it will he ... read »
Jim
Mar 19, 2010 at 4:49 PM
Nobody Puts Baby In The Corner!
Wow. This is like suddenly finding a support group for your secret shame. I'm not alone! I always liked this movie, even though it is extremely cheesy. I just wish Jennifer Grey hadn't gotten the ... read »
Mar 19, 2010 at 4:47 PM
Application.cfc OnRequest() Method Affects OnError() Arguments
@Jason and @Ben, I've been doing some CF9 refactoring on our systems and noticed an odd occurrence with onError as well. Found a way to work around my problem, but what I saw was... Background: Our ... read »
Jim
Mar 19, 2010 at 4:44 PM
Shoot 'Em Up Starring Clive Owen And Paul Giamatti
I actually enjoyed this movie quite a lot. It was different, certainly, but I think they were going for more of a Quentin Tarentino-"wow, that was weird"-vibe than an actual spoof. Once I realize ... read »
Mar 19, 2010 at 4:34 PM
An Intensive Exploration Of jQuery With Ben Nadel (Video Presentation)
Hey I guess the video is down. Is there anyway you can upload to youtube or vimeo or some other service? Greatly appreciated. ... read »
Mar 19, 2010 at 4:24 PM
ColdFusion CFPOP - My First Look
@Ben Thanks for the follow up! The root of the problem had to do with being able to trace bounced emails to specific records in a DB table. Let's say you run an email campaign and you get 1,000 bou ... read »
Mar 19, 2010 at 4:15 PM
SQL COUNT( NULLIF( .. ) ) Is Totally Awesome
Thank you Ben and Tony! Either of these work for the summary report I am working on and the info is much appreciated! I think I like Tony's a little better because I won't have to educate every ... read »
Mar 19, 2010 at 3:35 PM
ColdFusion Path Usage And Manipulation Overview
@Ben, Sorry. Clarification. expandpath worked for me in application.cfc, but not in other templates. ... read »