I was testing out my new functions JREGetNoCase() and JREGet() (uses Java regular expressions to return all matching substrings of a given string) by attempting to grab IMG tags off of random web sites:
<!--- Get the images for this page. --->
<cfset arrImages = JREGetNoCase(
This gets all A tags that have an IMG as the only child element. The functions work perfectly. I am actually totally excited about them. But, as I was dumping out the data, I realized that only some of the images worked on my page; however, when I pasted the captured IMG source into another window, the image loaded just fine.
Very curious. After some research, I see that Apache can block access (and maybe IIS can too) to files based on header data (among other criteria). It seems that some sites were blocking my file "grabs" since they were coming from my site.
To get around this, I had to create a sepparate page that would grab the img binary using a falsified CGI value (http referer) and stream that binary data to the browser:
<!--- Kill extra output. --->
<!--- Set page settings. --->
<cfsetting showdebugoutput="false" />
<!--- Param url variables. --->
<cfparam name="URL.src" type="string" default="" />
<!--- Get the domain of the image. --->
<cfset strDomain = REReplace( URL.src, "(\.(com|net)).+", "\1", "ONE" ) />
<!--- Grab the source image. --->
<!--- Set referrer params. --->
<cfhttpparam type="CGI" name="http_referer" value="#strDomain#" encoded="false" />
As you can see above, I grab the site Domain information from the actual SRC value, then I set that domain information as the CGI.http_referer for the CFHttp Get. This works like a charm (95% of the time). It doesn't have any error checking, but that could easily be worked in via the Status of the CFHttp return data.
Hmm, that's somewhat unethical, since people disable hotlinking for a reason (bandwidth costs, etc.).
Also, the regexes could be improved. ;-)
Unethical and down right impractical. If you were to hotlink images, it means that you have to put processing and the data transfer time into every single image that you display.
I don't ever see this type of thing being used to "Steal" content but rather to download content such as by an Offline-Explorer / archiving type of application.
thanks for the code.