Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
I am the chief technical officer at InVision App, Inc - a prototyping and collaboration platform for designers, built by designers. I also rock out in JavaScript and ColdFusion 24x7.
Meanwhile on Twitter
Loading latest tweet...
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with:

Ask Ben: Grabbing Google Results with CFHttp

By Ben Nadel on

I am getting errors when I try to grab google results with cfhttp. But, when I go to page with my browser, it works just fine. What am I doing wrong?

You are not doing anything wrong. Google wants to be used by regular web users. CFHttp does not announce itself as a regular user. When you do a CFHttp page grab, it passes along, as its User Agent a non-standard value. I am not sure offhand what it is, but I think it sends "ColdFusion" as its user agent. Doing a regular CFHttp will return this error:

Your client does not have permission to get URL /search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search from this server.

This is there for a reason: you might be violating the Google terms of service (I have not read them, nor do I condone working around this). If you want to avoid this, you can fake Google into thinking you ARE a web browser by sending a standard user agent in your CFHttp:

  • <!--- Grab the google search results. --->
  • <cfhttp
  • url="http://www.google.com/search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search"
  • useragent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; FDM)"
  • result="objGoogleGrab"
  • method="GET"
  • resolveurl="true"
  • />
  •  
  • <!--- Output the search results. --->
  • <cfoutput>
  • #objGoogleGrab.FileContent#
  • </cfoutput>

Notice that I am sending the FireFox / Mozilla user agent. This should work just fine. But again, I am not aware of the legality of such an action - proceed with caution.




Reader Comments

Great post on pulling Google search results pages.
Now if I could only figure out how to get it to only pull
the result for a particular site's listing.
Trying to analyze the different result text for one domain
for different keyword searches.

Reply to this Comment

@Dr. Adam,

Just put "site:youdomain.com" in the google query and it should only pull for a particular site.

Reply to this Comment

Thanks for the post. I didn't even think about using cfhttp to grab google results. Any way to just grab the results and not the rest of the google page that appears?

Reply to this Comment

@Keith,

You could use markers in the page to probably only grab the start / end of the results. However, you might be better off seeing if Google has some search API that fits your desires more easily.

Reply to this Comment

OK, I'm trying to use this with a Google Site Search custom search engine. Thought this fixed my first problem, now I get a fully formatted HTML page, insteadof straight XML. If I go to the link directly inthe browser, I get XML.

I looked at the WoW example, but when I run those code snippets, I STILL get HTML for both of those.

I'm very confused. :(

Reply to this Comment

@Thane,

Are you sure you're passing through the same user agent that your browser has? Try hitting a CFM page and outputting the http_user_agent to see what's posting. Then, post that to your search page.

Reply to this Comment

Is there a way to make the search terms user in-putted? And to make it so only the first 10 results show?

Reply to this Comment

Post A Comment

?
You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.