You've been doing a lot with XML lately, so I thought maybe you could solve a problem I'm having. I'm a World of Warcraft junky and want to pull data from Blizzard's "armory" web site. It's all XML, so should be easy, but when I do
<cfhttp url="http://armory.worldofwarcraft.com/....." method="get" result="armoryXML">
It returns the full HTML, post style sheet transformation. Yet go to that page and view source and you see the lovely XML. I've seen PHP code that hits the data on this site by doing the PHP equivalent of a cfhttp.
I tried your ColdFusion CFHttp method and did, indeed, get the transformed page content. Then, I went directly to the page in my browser and viewed the source. The source, just as you said it would be, was an XML document with an attached XSLT processing instruction. This was confusing since ColdFusion's CFHttp tag works just like any browser request. So, what's different between the CFHttp request and the browser's request that would render different content?
There is nothing about this that is obvious at all, but if you have worked with CFHttp for a long time, you might know that CFHttp causes problems because the user agent it broadcasts is that of the ColdFusion server's HTTP agent. So, when you go to a site with your FireFox or Internet Explorer browser, the browser generally broadcasts its user agent as a Mozilla or MSIE compatible browser but ColdFusion, on the other hand, announces itself as the "ColdFusion" user agent.
Knowing this, I did a little experiment, making your ColdFusion CFHttp call in two ways: one with no explicit user agent and one with the FireFox user agent:
<!--- Store the user agent that I am using with my browser (you're damn right I use FireFox!). I am breaking this data into two lines for display purposes ONLY. ---> <cfset strUserAgent = ( "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; " & "rv:184.108.40.206) Gecko/20070309 Firefox/220.127.116.11" ) /> <!--- Get the target URL that we are grabbing. This page provides an XML document with an XSL transformation processing instruction. We want to grab the XML without the data being transformed into HTML. I am breaking this data into two lines for display purposes ONLY. ---> <cfset strURL = ( "http://armory.worldofwarcraft.com/character-sheet.xml" & "?r=Bloodhoof&n=Castlereagh" ) /> <!--- Grab the target page without sending across any user agent information. When we do this, ColdFusion automatically puts in "ColdFusion" as the browser's user agent value. ---> <cfhttp url="#strURL#" method="GET" result="objHTTP" /> <!--- Now, let's grab the same page, but this time, instead of letting ColdFusion send over a default user agent, we are going to explicitly define what user agent the HTTP reuqest should announce. ---> <cfhttp url="#strURL#" method="GET" result="objHTTPWithUA" useragent="#strUserAgent#" /> <!--- Let's output the leading characters of the request without the user agent value. ---> <p> <strong>Request With ColdFusion User Agent</strong>: </p> <p> #HtmlEditFormat( Left( objHTTP.FileContent, 500 ) )# </p> <!--- Let's output the leading characters of the request in which we explicitly defined the FireFox user agent. ---> <p> <strong>Request With FireFox User Agent</strong>: </p> <p> #HtmlEditFormat( Left( objHTTPWithUA.FileContent, 500 ) )# </p>
Running the above code, we get the following screen output:
Request With ColdFusion User Agent:
Request With FireFox User Agent:
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="/layout/character-sheet.xsl"?> <page globalSearch="1" lang="en_us" requestUrl="/character-sheet.xml"> <characterInfo> <character battleGroup="Ruin" charUrl="r=Bloodhoof&n=Castlereagh" class="Hunter" classId="3" faction="Alliance" factionId="0" gender="Male" genderId="0" guildName="Veni Vidi Vici" guildUrl="r=Bloodhoof&n=Veni+Vidi+Vici&p=1" lastModified="May 18, 2007" level="70" name="Castlereagh" rac
As you can see, the first request in which we do not define a user agent (therefore letting "ColdFusion" be announced) sends us back the fully transformed HTML page complete with HTML, HEAD, and TITLE tags. But, in the second ColdFusion CFHttp request, where we announce the request as coming from a FireFox compatible browser, the true XML document is returned complete with XSLT, DocType, PAGE, and CHARACTERINFO XML nodes.
Now, just because I figured out what was going wrong, that does not mean that I can offer any insight into why this is happening. I assume this is just some functionality somewhere that is trying to be "clever" by servering up the content that it thinks the requester wants to see. And, for some reason, it assumes that FireFox wants the XML, but the ColdFusion request wants the transformed HTML page. Who knows... maybe someone reading this can offer more insight in that respect.
Want to use code from this post? Check out the license.