Ask Ben: CFHttp For Web Mining And Image Hot Linking
Posted July 13, 2006 at 8:44 AM by Ben Nadel
I like the pornography and I trying to create a rather large library of it on my file server at home. I have build my own ColdFusion spider which crawls over adult web sites and downloads free graphics and videos. I am having a problem with CFHttp where only some of the content is downloaded. Most of it shows up as some weird "not available" graphic; however, when I go to the URL in question, the graphics show up fine and I can right-click and save. Am I not using CFHttp correctly?
Let me just start off by saying that you must be careful about copyright laws with this sort of thing. I don't know the laws, but just be careful about what you are "grabbing" from other company's web sites. Remember that you are not only taking their content, you are also using their bandwidth which can impact their file-transfer limits.
That being said, it sounds like you are using ColdFusion CFHttp correctly. The problem here lies on the target server that is serving up the requested graphics or videos. What you are doing is sometimes referred to as web mining or image hot linking.
Web Mining is just a generic terms for gathering information off of the web is some sort of systematic, usually automated fashion. Image hot linking is when you display on your site an image that is located under another domain (and probably on another server). Unfortunately the server administrators don't want you "stealing" their images and bandwidth and that "not available" image that is showing a lot of the time is their attempt to stop you from grabbing content that they paid for and server from their site.
So, how do you get around this issue? First, you have to understand how hot linking it being prevented (in most cases). Thomas Scott does a good job on A List Apart of explaining how to check the user's referrer url to block hot linking. There are times, I find, when a server is doing something more complicated, that I cannot crack, but those servers are few and far between and are generally large servers that handle, specifically, file-serving.
Ok, so now to the nitty gritty. To overcome this "issue," you have to extend your ColdFusion CFHttp method a bit to set browser variables sent in the CGI object. Let's work with an example. Say you are trying to grab the image:
WARNING: Adult Image
... off of the page:
WARNING: Adult Site
... we want to perform a ColdFusion CFHttp grab using the target PAGE as the referrer for the image grab. This way, we can fool the server into thinking that we are a user on the page viewing their images (after all, your browser is just making requests to the server for images like we are). Additionally, you will want to change the user agent as ColdFusion sends its own user agent by default in the CFHttp call:
- useragent="Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:22.214.171.124) Gecko/20060308 Firefox/126.96.36.199"
- <!--- Set referrer params. In this case, we want to override the referrer. --->
As you can see, we set the user agent to be some flavor of the Mozilla Firefox browser and the referrer is the page upon which the image was originally linked. Now, as I said before, this does NOT work all the time, but it does work a good amount of the time. What you do with the binary image data (stored in objHttp.FileContent in the above example) is up to you. Be sure to check that the image is valid before you try to do anything with it:
- <!--- Check to see if we found the image. --->
- <cfif (
- FindNoCase( "200", objHttp.Statuscode ) AND
- FindNoCase( "image", objHttp.Responseheader["Content-Type"] )
- <!--- We have an image. --->
- <!--- Blast! The image didn't come through. --->
What Other People Are Searching For
[ local search ]
[ local search ] cfhttp images not showing
[ local search ] coldfusion web mining using cfhttp
[ local search ] coldfusion cfhttp hot linking
[ local search ] web mining and hot linking
[ local search ] how to use cfhttp in webdata mining
[ local search ] cfhttp to get images
[ local search ] linking an image to a url
[ local search ] cfhttp demo
[ local search ] fake http_referer with coldfusion
[ local search ] cold fusion cfhttp images
[ local search ] how to use cfhttp correctly
[ local search ] coldfusion cfhttp get images
I'm trying to pull my typepad rss feed into an CFM document. I haven't had any luck doing this with Cold Fusion. So I created a php file on my server that pulls my typepad RSS feed. The php file works fine by itself, but I'm having a devil of a time getting it to be included in a cfm page.
When I use CFINCLUDE, it apparently just reads it as text and tries to process it as coldfusion... which doesn't work. How do I get it to look at the php file, process it, and then bring the results into my CFM page?
ColdFusion will not inherently excute a PHP page. There are ways to run some PHP in ColdFusion, but I have never done that. You can either try to get it to work in ColdFusion (RSS reading), or use the PHP file to write an XML file that ColdFusion reads in and parses maybe?