The other day, someone posted a SPAM comment on my site that directed users to a site serving up free Playboy magazine centerfold downloads. I am so tired of people posting spam to my site that I figured I would actually try and turn the tables on this particular spammer. Sites like this operate on the fact that you have to actually view their pages and therefore, view their ads (which is how, I assume, they make their money). However, grabbing the merchandise without loading any ads defeats their entire reason for spamming in the first place.
And so it is that I set out to script the download of every Playboy magazine centerfold from 1954 to 2007 without ever loading a single page on the spammers site (naturally, I had to load a few pages to see how the site worked). Here's what I came up with:
<!--- Photo pages in form of: http://www.oxpe.net/playboy/playboy195401.html Photos in form of: http://www.oxpe.net/playboy/photos/195401.jpg Available Years: 1954 - 2007 ---> <!--- Set a high request timeout - there are a LOT of images to be downloading here. ---> <cfsetting requesttimeout="500" /> <!--- Loop over available years. ---> <cfloop index="intYear" from="1954" to="2007" step="1"> <!--- Loop over available months. ---> <cfloop index="intMonth" from="1" to="12" step="1"> <!--- Get the short hand for the file name. All the months have to be double-digits. ---> <cfset strName = ( intYear & NumberFormat( intMonth, "00" ) ) /> <!--- Set up base URL that will be used by both the CFHttp target as well as the referer. ---> <cfset strBaseURL = "http://www.oxpe.net/playboy/" /> <!--- Echo back the photo we are trying to get. ---> <p> <cfoutput> #strName#.jpg </cfoutput> </p> <!--- Perform an HTTP GET to grab the target image as binary. CAUTION: Once we go beyond the vailable year/months (ex. 2007/12), this will come back with 200 status, but NOT be a valid image binary. ---> <cfhttp method="get" url="#strBaseURL#photos/#strName#.jpg" useragent="#CGI.http_user_agent#" getasbinary="yes" result="objGET"> <!--- Set CGI referrer to be the page that it was called from. We want to fake the target server into thinking we just came from an internally hosted page. ---> <cfhttpparam type="CGI" name="referer" value="#strBaseURL#playboy#strName#.html" /> </cfhttp> <!--- Check status. ---> <cfif FindNoCase( "200", objGET.StatusCode )> <!--- Save file. ---> <cffile action="write" file="#ExpandPath( './#strName#.jpg' )#" output="#objGET.FileContent#" /> </cfif> <p> <cfoutput> » <em>#objGET.StatusCode#</em> </cfoutput> </p> <cfflush /> </cfloop> </cfloop>
Unfortunately, this did not work at all. Running the code above, the server kept returning 403 Forbidden Access errors:
» 403 Forbidden
» 403 Forbidden
» 403 Forbidden
Clearly, the server had something in place to prevent hotlinking. But, I was sending the Referer, which should have taken care of this.
After a good deal of time trying to tweak the values, I finally turned to one of the most badass tools out there - FireBug. I actually went to the target page and viewed the HTTP Request headers that were being sent across for the graphic request:
Nothing was popping out at me. But then, I realized I was looking at the HEADER values. Obviously. But, wasn't I sending the Referer value as a CGI value? I know that ColdFusion's CFHttpParam tag has a HEADER type, so I tried to change the type from Referer to HEADER:
<!--- Perform an HTTP GET to grab the target image as binary. CAUTION: Once we go beyond the vailable year/months (ex. 2007/12), this will come back with 200 status, but NOT be a valid image binary. ---> <cfhttp method="get" url="#strBaseURL#photos/#strName#.jpg" useragent="#CGI.http_user_agent#" getasbinary="yes" result="objGET"> <!--- Set referrer to be the page that it was called from. We want to fake the target server into thinking we just came from an internally hosted page. Use the HEADER value rather than the CGI value. ---> <cfhttpparam type="HEADER" name="referer" value="#strBaseURL#playboy#strName#.html" /> </cfhttp>
This time, things went off without a hitch:
» 200 OK
» 200 OK
» 200 OK
Works great - downloads all the Playboy centerfolds since the beginning of time, but this got me thinking: if both the CGI:Referer and the HEADER:Referer end up in the CGI scope (at least in ColdFusion), what's the difference between sending these two values. Why did one work and one not work?
The answer turns out to be Encoding. By default, the ColdFusion CFHttpParam tag Encodes all FormField and CGI value types using a URL-encoding. HEADER values, on the other hand, are not encoded in any automatic way. Therefore, if you tried to send this value:
... as a CFHttpParam CGI value, it would show up in the CGI object as:
This has been encoded for URL usage. If, however you turned off encoding (Encoded = "false"), or sent it as a CFHttpParam HEADER value, it would show up in the CGI object as:
This is good stuff to know. I should really do a much more in-depth exploration of all the different CFHttpParam types to see how they can really be leveraged properly. I am still not 100% clear on the difference between all the HEADER and CGI values; it looks like HEADER values might be a more natural way to mimic this sort of clint-server interaction. I will do some further testing.
On a related note, it's really funny to see the contrast in what Playboy presented in 1954 compared to what they put in the magazine in today. 1954 was very safe. Pubic hair didn't really make much of any show until the early 1970s... and now, in the 2000s, pubic hair is gone again (but this time, of course, for very different reasons).
Want to use code from this post? Check out the license.