Ask Ben: CFHttp For Web Mining And Image Hot Linking

Posted July 13, 2006 at 8:44 AM by Ben Nadel

Tags: ColdFusion, Ask Ben

I like the pornography and I trying to create a rather large library of it on my file server at home. I have build my own ColdFusion spider which crawls over adult web sites and downloads free graphics and videos. I am having a problem with CFHttp where only some of the content is downloaded. Most of it shows up as some weird "not available" graphic; however, when I go to the URL in question, the graphics show up fine and I can right-click and save. Am I not using CFHttp correctly?

Let me just start off by saying that you must be careful about copyright laws with this sort of thing. I don't know the laws, but just be careful about what you are "grabbing" from other company's web sites. Remember that you are not only taking their content, you are also using their bandwidth which can impact their file-transfer limits.

That being said, it sounds like you are using ColdFusion CFHttp correctly. The problem here lies on the target server that is serving up the requested graphics or videos. What you are doing is sometimes referred to as web mining or image hot linking.

Web Mining is just a generic terms for gathering information off of the web is some sort of systematic, usually automated fashion. Image hot linking is when you display on your site an image that is located under another domain (and probably on another server). Unfortunately the server administrators don't want you "stealing" their images and bandwidth and that "not available" image that is showing a lot of the time is their attempt to stop you from grabbing content that they paid for and server from their site.

So, how do you get around this issue? First, you have to understand how hot linking it being prevented (in most cases). Thomas Scott does a good job on A List Apart of explaining how to check the user's referrer url to block hot linking. There are times, I find, when a server is doing something more complicated, that I cannot crack, but those servers are few and far between and are generally large servers that handle, specifically, file-serving.

Ok, so now to the nitty gritty. To overcome this "issue," you have to extend your ColdFusion CFHttp method a bit to set browser variables sent in the CGI object. Let's work with an example. Say you are trying to grab the image:

http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG
WARNING: Adult Image

... off of the page:

http://www.donovanphillips.com/galleries/ncg/fawn01/index.php
WARNING: Adult Site

... we want to perform a ColdFusion CFHttp grab using the target PAGE as the referrer for the image grab. This way, we can fool the server into thinking that we are a user on the page viewing their images (after all, your browser is just making requests to the server for images like we are). Additionally, you will want to change the user agent as ColdFusion sends its own user agent by default in the CFHttp call:

  • <cfhttp
  • url="http://www.donovanphillips.com/galleries/ncg/fawn01/FawnNCG1-0033.JPG"
  • method="GET"
  • useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.2) Gecko/20060308 Firefox/1.5.0.2"
  • getasbinary="yes"
  • result="objHttp">
  •  
  • <!--- Set referrer params. In this case, we want to override the referrer. --->
  • <cfhttpparam
  • type="CGI"
  • name="http_referer"
  • value="http://www.donovanphillips.com/galleries/ncg/fawn01/index.php"
  • encoded="false"
  • />
  • </cfhttp>

As you can see, we set the user agent to be some flavor of the Mozilla Firefox browser and the referrer is the page upon which the image was originally linked. Now, as I said before, this does NOT work all the time, but it does work a good amount of the time. What you do with the binary image data (stored in objHttp.FileContent in the above example) is up to you. Be sure to check that the image is valid before you try to do anything with it:

  • <!--- Check to see if we found the image. --->
  • <cfif (
  • FindNoCase( "200", objHttp.Statuscode ) AND
  • FindNoCase( "image", objHttp.Responseheader["Content-Type"] )
  • )>
  • <!--- We have an image. --->
  • <cfelse>
  • <!--- Blast! The image didn't come through. --->
  • </cfif>

If you want to see this in action, please check out my ColdFusion CFHttp example in my ColdFusion Snippets section.




Reader Comments

Feb 6, 2008 at 11:53 AM // reply »
1 Comments

I'm trying to pull my typepad rss feed into an CFM document. I haven't had any luck doing this with Cold Fusion. So I created a php file on my server that pulls my typepad RSS feed. The php file works fine by itself, but I'm having a devil of a time getting it to be included in a cfm page.

When I use CFINCLUDE, it apparently just reads it as text and tries to process it as coldfusion... which doesn't work. How do I get it to look at the php file, process it, and then bring the results into my CFM page?


Feb 6, 2008 at 12:12 PM // reply »
10,640 Comments

@Marnie,

ColdFusion will not inherently excute a PHP page. There are ways to run some PHP in ColdFusion, but I have never done that. You can either try to get it to work in ColdFusion (RSS reading), or use the PHP file to write an XML file that ColdFusion reads in and parses maybe?


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »
Feb 9, 2012 at 10:29 PM
Learning ColdFusion 9: Application-Specific Data Sources
@Ben, No offence, but if people were really wanting advanced features they would be using a platform like ASP.NET MVC. CFML is so structurally compromised as a tag-based scripting language that ... read »
Feb 9, 2012 at 10:03 PM
Subversion - Cleanup Failed To Process The Following Paths
@Leviaguirre, do you still have problems with this? ... read »