Ask Ben: Grabbing Google Results with CFHttp

Posted August 23, 2006 at 2:55 PM

Tags: ColdFusion, Ask Ben

I am getting errors when I try to grab google results with cfhttp. But, when I go to page with my browser, it works just fine. What am I doing wrong?

You are not doing anything wrong. Google wants to be used by regular web users. CFHttp does not announce itself as a regular user. When you do a CFHttp page grab, it passes along, as its User Agent a non-standard value. I am not sure offhand what it is, but I think it sends "ColdFusion" as its user agent. Doing a regular CFHttp will return this error:

Your client does not have permission to get URL /search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search from this server.

This is there for a reason: you might be violating the Google terms of service (I have not read them, nor do I condone working around this). If you want to avoid this, you can fake Google into thinking you ARE a web browser by sending a standard user agent in your CFHttp:

 Launch code in new window » Download code as text file »

  • <!--- Grab the google search results. --->
  • <cfhttp
  • url="http://www.google.com/search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search"
  • useragent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; FDM)"
  • result="objGoogleGrab"
  • method="GET"
  • resolveurl="true"
  • />
  •  
  • <!--- Output the search results. --->
  • <cfoutput>
  • #objGoogleGrab.FileContent#
  • </cfoutput>

Notice that I am sending the FireFox / Mozilla user agent. This should work just fine. But again, I am not aware of the legality of such an action - proceed with caution.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Other Searches  |  Print Page





Reader Comments

Dec 9, 2007 at 9:57 PM // reply »
1 Comments

Great post on pulling Google search results pages.
Now if I could only figure out how to get it to only pull
the result for a particular site's listing.
Trying to analyze the different result text for one domain
for different keyword searches.


Dec 10, 2007 at 7:06 AM // reply »
7,546 Comments

@Dr. Adam,

Just put "site:youdomain.com" in the google query and it should only pull for a particular site.


Sep 4, 2009 at 3:37 AM // reply »
2 Comments

Thanks for the post. I didn't even think about using cfhttp to grab google results. Any way to just grab the results and not the rest of the google page that appears?


Sep 6, 2009 at 11:35 AM // reply »
7,546 Comments

@Keith,

You could use markers in the page to probably only grab the start / end of the results. However, you might be better off seeing if Google has some search API that fits your desires more easily.


Sep 6, 2009 at 1:39 PM // reply »
2 Comments

After posting my comment I found their custom search service and that works like a charm.


Sep 6, 2009 at 2:21 PM // reply »
7,546 Comments

@Keith,

Ok great.


Nov 19, 2009 at 2:50 PM // reply »
1 Comments

OK, I'm trying to use this with a Google Site Search custom search engine. Thought this fixed my first problem, now I get a fully formatted HTML page, insteadof straight XML. If I go to the link directly inthe browser, I get XML.

I looked at the WoW example, but when I run those code snippets, I STILL get HTML for both of those.

I'm very confused. :(


Nov 19, 2009 at 6:29 PM // reply »
7,546 Comments

@Thane,

Are you sure you're passing through the same user agent that your browser has? Try hitting a CFM page and outputting the http_user_agent to see what's posting. Then, post that to your search page.


Post Comment  |  Ask Ben

Recent Blog Comments
Mar 19, 2010 at 8:37 AM
Ask Ben: Javascript Replace And Multiple Lines / Line Breaks
@Abdul, Looks like "re_nlchar" is undefined. You don't have an "else" statement in your if-else-if statements; you are hitting a case where none of the conditions in your IF statement are true, cau ... read »
Mar 19, 2010 at 8:33 AM
jQuery's Event Triggering, Order Of Default Behavior, And triggerHandler()
@Alex, The beauty of jQuery method chaining. ... read »
Mar 19, 2010 at 8:30 AM
Regular Expressions Make CSV Parsing In ColdFusion So Much Easier (And Faster)
@Ziggy, 4000 lines doesn't seem like too much. I guess string parsing takes up more memory that I assume it does. I am not sure what to tell you about that. I suppose you could try using a buffered ... read »
Mar 19, 2010 at 8:28 AM
SQL COUNT( NULLIF( .. ) ) Is Totally Awesome
@Robert, That's pretty clever to subtract 1 from the flag value. Snazzy. ... read »
Mar 19, 2010 at 8:26 AM
Posting XML SOAP Requests With jQuery
@Jason, The RegExp object and the replace method are parts of the core Javascript language; they are not part of jQuery. ... read »
Mar 19, 2010 at 8:10 AM
Exploring ColdFusion Component Runtime Class Properties And Serialization
@Elliott, I guess I've never done Enterprise type architectures where I'm actually passing around full-on objects. I figured I would typically communicate through an API. I can't even think of a sy ... read »
Mar 19, 2010 at 5:39 AM
Regular Expressions Make CSV Parsing In ColdFusion So Much Easier (And Faster)
I get a java heap error on a 4000 line x 8 col csv file. (Processing only, not doing anything with the result yet.) If I cut the file in half it works. Why does it use so much memory? Can anything ... read »
Mar 19, 2010 at 1:43 AM
jQuery Attr() Function Doesn't Work With IMAGE.complete
sample: ..... var loadWatch = setInterval(function() { if(img.complete) { clearInterval(loadWatch); completeCallback(img); } }, 100); } else .... ... read »