Ask Ben: CFHttp For Web Mining And Image Hot Linking

Posted July 13, 2006 at 8:44 AM

Tags: ColdFusion, Ask Ben

I like the pornography and I trying to create a rather large library of it on my file server at home. I have build my own ColdFusion spider which crawls over adult web sites and downloads free graphics and videos. I am having a problem with CFHttp where only some of the content is downloaded. Most of it shows up as some weird "not available" graphic; however, when I go to the URL in question, the graphics show up fine and I can right-click and save. Am I not using CFHttp correctly?

Let me just start off by saying that you must be careful about copyright laws with this sort of thing. I don't know the laws, but just be careful about what you are "grabbing" from other company's web sites. Remember that you are not only taking their content, you are also using their bandwidth which can impact their file-transfer limits.

That being said, it sounds like you are using ColdFusion CFHttp correctly. The problem here lies on the target server that is serving up the requested graphics or videos. What you are doing is sometimes referred to as web mining or image hot linking.

Web Mining is just a generic terms for gathering information off of the web is some sort of systematic, usually automated fashion. Image hot linking is when you display on your site an image that is located under another domain (and probably on another server). Unfortunately the server administrators don't want you "stealing" their images and bandwidth and that "not available" image that is showing a lot of the time is their attempt to stop you from grabbing content that they paid for and server from their site.

So, how do you get around this issue? First, you have to understand how hot linking it being prevented (in most cases). Thomas Scott does a good job on A List Apart of explaining how to check the user's referrer url to block hot linking. There are times, I find, when a server is doing something more complicated, that I cannot crack, but those servers are few and far between and are generally large servers that handle, specifically, file-serving.

Ok, so now to the nitty gritty. To overcome this "issue," you have to extend your ColdFusion CFHttp method a bit to set browser variables sent in the CGI object. Let's work with an example. Say you are trying to grab the image:

http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG
WARNING: Adult Image

... off of the page:

http://www.donovanphillips.com/galleries/ncg/fawn01/index.php
WARNING: Adult Site

... we want to perform a ColdFusion CFHttp grab using the target PAGE as the referrer for the image grab. This way, we can fool the server into thinking that we are a user on the page viewing their images (after all, your browser is just making requests to the server for images like we are). Additionally, you will want to change the user agent as ColdFusion sends its own user agent by default in the CFHttp call:

 Launch code in new window » Download code as text file »

  • <cfhttp
  • url="http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG"
  • method="GET"
  • useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2"
  • getasbinary="yes"
  • result="objHttp">
  •  
  • <!--- Set referrer params. In this case, we want to override the referrer. --->
  • <cfhttpparam
  • type="CGI"
  • name="http_referer"
  • value="http://www.donovanphillips.com/galleries/ncg/fawn01/index.php"
  • encoded="false"
  • />
  • </cfhttp>

As you can see, we set the user agent to be some flavor of the Mozilla Firefox browser and the referrer is the page upon which the image was originally linked. Now, as I said before, this does NOT work all the time, but it does work a good amount of the time. What you do with the binary image data (stored in objHttp.FileContent in the above example) is up to you. Be sure to check that the image is valid before you try to do anything with it:

 Launch code in new window » Download code as text file »

  • <!--- Check to see if we found the image. --->
  • <cfif (
  • FindNoCase( "200", objHttp.Statuscode ) AND
  • FindNoCase( "image", objHttp.Responseheader["Content-Type"] )
  • )>
  • <!--- We have an image. --->
  • <cfelse>
  • <!--- Blast! The image didn't come through. --->
  • </cfif>

If you want to see this in action, please check out my ColdFusion CFHttp example in my ColdFusion Snippets section.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page





Reader Comments

Feb 6, 2008 at 11:53 AM // reply »
1 Comments

I'm trying to pull my typepad rss feed into an CFM document. I haven't had any luck doing this with Cold Fusion. So I created a php file on my server that pulls my typepad RSS feed. The php file works fine by itself, but I'm having a devil of a time getting it to be included in a cfm page.

When I use CFINCLUDE, it apparently just reads it as text and tries to process it as coldfusion... which doesn't work. How do I get it to look at the php file, process it, and then bring the results into my CFM page?


Feb 6, 2008 at 12:12 PM // reply »
6,516 Comments

@Marnie,

ColdFusion will not inherently excute a PHP page. There are ways to run some PHP in ColdFusion, but I have never done that. You can either try to get it to work in ColdFusion (RSS reading), or use the PHP file to write an XML file that ColdFusion reads in and parses maybe?


Post Comment  |  Ask Ben

Recent Blog Comments
aha
Nov 22, 2009 at 7:42 AM
Using A Name Suffix In ColdFusion's CFMail Tag
Why not? ... read »
Nov 22, 2009 at 7:37 AM
Using A Name Suffix In ColdFusion's CFMail Tag
asd ... read »
Nov 22, 2009 at 4:30 AM
jQuery Live() Method And Event Bubbling
dasegtezr ... read »
Nov 22, 2009 at 4:03 AM
jQuery Live() Method And Event Bubbling
C_fieri ... read »
Nov 22, 2009 at 1:56 AM
Learning ColdFusion 9: Using CFQuery In CFScript Can Enable SQL Injection Attacks
Why adobe would give you script equivalent of cfquery is beyond me. I love cfquery tag because it helps me wriite clean sql, and get away from the horrible jdbc queries If I wanted to write javali ... read »
Nov 22, 2009 at 1:45 AM
Streaming Text Using ColdFusion's CFContent Tag And The Variable Attribute
The reason you would want to do this is to stream. Ack json/xml files to ria clients I used thus technique before because putting json in response stream causes debugging info to come thru As well a ... read »
Nov 21, 2009 at 6:47 PM
Hal Helms - Real World Object Oriented Development, Sarasota - Day Five
@charlie griefer, Thank you.. ... read »
Nov 21, 2009 at 5:15 PM
Using ColdFusion Structures To Remove Duplicate List Values
@Jose Galdamez, Oh heh yeah I didn't paste the whole code. I should have defined the vars -- my bad. It's fixed thou. Thanks. ... read »