Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2010 (Minneapolis, MN) with: Simon Free and Dan Wilson

Ask Ben: Grabbing Google Results with CFHttp

By Ben Nadel on

I am getting errors when I try to grab google results with cfhttp. But, when I go to page with my browser, it works just fine. What am I doing wrong?

You are not doing anything wrong. Google wants to be used by regular web users. CFHttp does not announce itself as a regular user. When you do a CFHttp page grab, it passes along, as its User Agent a non-standard value. I am not sure offhand what it is, but I think it sends "ColdFusion" as its user agent. Doing a regular CFHttp will return this error:

Your client does not have permission to get URL /search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search from this server.

This is there for a reason: you might be violating the Google terms of service (I have not read them, nor do I condone working around this). If you want to avoid this, you can fake Google into thinking you ARE a web browser by sending a standard user agent in your CFHttp:

  • <!--- Grab the google search results. --->
  • <cfhttp
  • url="http://www.google.com/search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search"
  • useragent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; FDM)"
  • result="objGoogleGrab"
  • method="GET"
  • resolveurl="true"
  • />
  •  
  • <!--- Output the search results. --->
  • <cfoutput>
  • #objGoogleGrab.FileContent#
  • </cfoutput>

Notice that I am sending the FireFox / Mozilla user agent. This should work just fine. But again, I am not aware of the legality of such an action - proceed with caution.




Reader Comments

Great post on pulling Google search results pages.
Now if I could only figure out how to get it to only pull
the result for a particular site's listing.
Trying to analyze the different result text for one domain
for different keyword searches.

Thanks for the post. I didn't even think about using cfhttp to grab google results. Any way to just grab the results and not the rest of the google page that appears?

@Keith,

You could use markers in the page to probably only grab the start / end of the results. However, you might be better off seeing if Google has some search API that fits your desires more easily.

OK, I'm trying to use this with a Google Site Search custom search engine. Thought this fixed my first problem, now I get a fully formatted HTML page, insteadof straight XML. If I go to the link directly inthe browser, I get XML.

I looked at the WoW example, but when I run those code snippets, I STILL get HTML for both of those.

I'm very confused. :(

@Thane,

Are you sure you're passing through the same user agent that your browser has? Try hitting a CFM page and outputting the http_user_agent to see what's posting. Then, post that to your search page.