Ask Ben: CFHttp For Web Mining And Image Hot Linking

Posted July 13, 2006 at 8:44 AM

Tags: ColdFusion, Ask Ben

I like the pornography and I trying to create a rather large library of it on my file server at home. I have build my own ColdFusion spider which crawls over adult web sites and downloads free graphics and videos. I am having a problem with CFHttp where only some of the content is downloaded. Most of it shows up as some weird "not available" graphic; however, when I go to the URL in question, the graphics show up fine and I can right-click and save. Am I not using CFHttp correctly?

Let me just start off by saying that you must be careful about copyright laws with this sort of thing. I don't know the laws, but just be careful about what you are "grabbing" from other company's web sites. Remember that you are not only taking their content, you are also using their bandwidth which can impact their file-transfer limits.

That being said, it sounds like you are using ColdFusion CFHttp correctly. The problem here lies on the target server that is serving up the requested graphics or videos. What you are doing is sometimes referred to as web mining or image hot linking.

Web Mining is just a generic terms for gathering information off of the web is some sort of systematic, usually automated fashion. Image hot linking is when you display on your site an image that is located under another domain (and probably on another server). Unfortunately the server administrators don't want you "stealing" their images and bandwidth and that "not available" image that is showing a lot of the time is their attempt to stop you from grabbing content that they paid for and server from their site.

So, how do you get around this issue? First, you have to understand how hot linking it being prevented (in most cases). Thomas Scott does a good job on A List Apart of explaining how to check the user's referrer url to block hot linking. There are times, I find, when a server is doing something more complicated, that I cannot crack, but those servers are few and far between and are generally large servers that handle, specifically, file-serving.

Ok, so now to the nitty gritty. To overcome this "issue," you have to extend your ColdFusion CFHttp method a bit to set browser variables sent in the CGI object. Let's work with an example. Say you are trying to grab the image:

http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG
WARNING: Adult Image

... off of the page:

http://www.donovanphillips.com/galleries/ncg/fawn01/index.php
WARNING: Adult Site

... we want to perform a ColdFusion CFHttp grab using the target PAGE as the referrer for the image grab. This way, we can fool the server into thinking that we are a user on the page viewing their images (after all, your browser is just making requests to the server for images like we are). Additionally, you will want to change the user agent as ColdFusion sends its own user agent by default in the CFHttp call:

 Launch code in new window » Download code as text file »

  • <cfhttp
  • url="http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG"
  • method="GET"
  • useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2"
  • getasbinary="yes"
  • result="objHttp">
  •  
  • <!--- Set referrer params. In this case, we want to override the referrer. --->
  • <cfhttpparam
  • type="CGI"
  • name="http_referer"
  • value="http://www.donovanphillips.com/galleries/ncg/fawn01/index.php"
  • encoded="false"
  • />
  • </cfhttp>

As you can see, we set the user agent to be some flavor of the Mozilla Firefox browser and the referrer is the page upon which the image was originally linked. Now, as I said before, this does NOT work all the time, but it does work a good amount of the time. What you do with the binary image data (stored in objHttp.FileContent in the above example) is up to you. Be sure to check that the image is valid before you try to do anything with it:

 Launch code in new window » Download code as text file »

  • <!--- Check to see if we found the image. --->
  • <cfif (
  • FindNoCase( "200", objHttp.Statuscode ) AND
  • FindNoCase( "image", objHttp.Responseheader["Content-Type"] )
  • )>
  • <!--- We have an image. --->
  • <cfelse>
  • <!--- Blast! The image didn't come through. --->
  • </cfif>

If you want to see this in action, please check out my ColdFusion CFHttp example in my ColdFusion Snippets section.

Download Code Snippet ZIP File

Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Learning ColdFusion 9 - ColdFusion 9 tutorials, samples, examples, demos

Reader Comments

Feb 6, 2008 at 11:53 AM // reply »
1 Comments

I'm trying to pull my typepad rss feed into an CFM document. I haven't had any luck doing this with Cold Fusion. So I created a php file on my server that pulls my typepad RSS feed. The php file works fine by itself, but I'm having a devil of a time getting it to be included in a cfm page.

When I use CFINCLUDE, it apparently just reads it as text and tries to process it as coldfusion... which doesn't work. How do I get it to look at the php file, process it, and then bring the results into my CFM page?


Feb 6, 2008 at 12:12 PM // reply »
6,371 Comments

@Marnie,

ColdFusion will not inherently excute a PHP page. There are ways to run some PHP in ColdFusion, but I have never done that. You can either try to get it to work in ColdFusion (RSS reading), or use the PHP file to write an XML file that ColdFusion reads in and parses maybe?


Post Comment  |  Ask Ben

Recent Blog Comments
Nov 7, 2009 at 5:53 PM
Ask Ben: Javascript String Replace Method
You can find here an advanced function that prepared with javascript replace function. This can make the first letters of words, sentences, lines and whatever you define automatically: http://www.m ... read »
Andrew Neely
Nov 7, 2009 at 4:56 PM
A Moment That Touched Me - The Fountainhead
Ben, Glad you enjoyed the podcast. Yeah, the Tank Riot guys can get really chatty during the episodes, but that's part of the charm of it for me. They've covered everything from Nichola Tesla to Cha ... read »
Nov 7, 2009 at 4:43 PM
Building A Fixed-Position Bottom Menu Bar (ala FaceBook)
Is it possible to make some more MenĂ¼`s ? ... read »
Jill
Nov 7, 2009 at 11:40 AM
How To Unformat Your Code (Like A Pro)
Derek, I think you might be right - sweet! Thanks for the link :) ... read »
Nov 7, 2009 at 11:25 AM
How To Unformat Your Code (Like A Pro)
I think it would be way easier to just use this http://www.logichammer.com/html-formatter/ He just released v3 and it rocks. ... read »
Jill
Nov 7, 2009 at 7:58 AM
How To Unformat Your Code (Like A Pro)
LMAO - this was pretty funny! I have to admit - I also love to reformat code so I can read it. My boss used to tell me to leave my OCD at home. Now I don't feel so bad after reading everyone else' ... read »
Nov 6, 2009 at 10:10 PM
How To Unformat Your Code (Like A Pro)
The timing of this post is just uncanny. I spent the last 15-20 minutes manually un-formatting my "Ben Nadel" style code within a CFC of mine. I was really digging the readability a few weeks ago, bu ... read »
Roe
Nov 6, 2009 at 5:11 PM
Passing Arrays By Reference In ColdFusion - SWEEET!
ArraySort also reorders the results of these java obj's ... read »