Ask Ben: Grabbing Google Results with CFHttp
Posted August 23, 2006 at 2:55 PM
I am getting errors when I try to grab google results with cfhttp. But, when I go to page with my browser, it works just fine. What am I doing wrong?
You are not doing anything wrong. Google wants to be used by regular web users. CFHttp does not announce itself as a regular user. When you do a CFHttp page grab, it passes along, as its User Agent a non-standard value. I am not sure offhand what it is, but I think it sends "ColdFusion" as its user agent. Doing a regular CFHttp will return this error:
Your client does not have permission to get URL /search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search from this server.
This is there for a reason: you might be violating the Google terms of service (I have not read them, nor do I condone working around this). If you want to avoid this, you can fake Google into thinking you ARE a web browser by sending a standard user agent in your CFHttp:
Launch code in new window » Download code as text file »
- <!--- Grab the google search results. --->
- <cfhttp
- url="http://www.google.com/search?hl=en&lr=&q=Girls+Gone+Wild&btnG=Search"
- useragent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; FDM)"
- result="objGoogleGrab"
- method="GET"
- resolveurl="true"
- />
-
- <!--- Output the search results. --->
- <cfoutput>
- #objGoogleGrab.FileContent#
- </cfoutput>
Notice that I am sending the FireFox / Mozilla user agent. This should work just fine. But again, I am not aware of the legality of such an action - proceed with caution.
Download Code Snippet ZIP File
Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Newer Post
Ask Ben: Counting Spaces In A Given String
Older Post
Using The ColdFusion Query Object As A Complex Object Iterator
Reader Comments
Great post on pulling Google search results pages.
Now if I could only figure out how to get it to only pull
the result for a particular site's listing.
Trying to analyze the different result text for one domain
for different keyword searches.
@Dr. Adam,
Just put "site:youdomain.com" in the google query and it should only pull for a particular site.
Thanks for the post. I didn't even think about using cfhttp to grab google results. Any way to just grab the results and not the rest of the google page that appears?
@Keith,
You could use markers in the page to probably only grab the start / end of the results. However, you might be better off seeing if Google has some search API that fits your desires more easily.
After posting my comment I found their custom search service and that works like a charm.
@Keith,
Ok great.
OK, I'm trying to use this with a Google Site Search custom search engine. Thought this fixed my first problem, now I get a fully formatted HTML page, insteadof straight XML. If I go to the link directly inthe browser, I get XML.
I looked at the WoW example, but when I run those code snippets, I STILL get HTML for both of those.
I'm very confused. :(
@Thane,
Are you sure you're passing through the same user agent that your browser has? Try hitting a CFM page and outputting the http_user_agent to see what's posting. Then, post that to your search page.



