A little while back I posted about how to create simple Microsoft Excel documents using HTML. That's good and all and provides some basic formatting, but it is a definite hack. Someone suggested to me that I look into using XML to create Excel documents. As it turns out, using XML to create an Excel file is not only easy, it provides much more control over formatting, not to mention that I can rock cell formulas and a bunch of other things that make Excel great. But, perhaps the coolest thing is that, unlike the HTML Table method, the XML document opens up in Microsoft Excel document as if it were a true Excel document; the entire grid is laid out nicely, tabs are there, it's beautiful.
To start off with, I am going to show you how to create a simple data grid based on a query. As always, let's build a query to test with. For this one, it will be a query of movies, their name, rating, and some additional information:
Launch code in new window » Download code as text file »
Now that we have a our query, let's create a Microsoft XML Excel document with a header row followed by data rows:
Launch code in new window » Download code as text file »
As you can see, the Excel data is defined in a TABLE XML node. The nodes within the Table element are fairly self-explanatory. The Data element defines what data type Excel will use to interpret the data. Each cell can take a formatting option (a CSS Style defined in the Styles nodes) to help format the data. Notice that we are taking our Rating cell to a single decimal place and that our boolean columns are being output in Yes / No format. These formatting options are all defined by the Styles.
Then, once I have the data stored up, I trim it and remove the white space between tags. Then, I convert it to binary and stream via the CFContent's Variable attribute (the variable attribute requires binary, hence the conversion). The use of the variable attribute automatically resets the content buffer so don't worry about any previous white space that was created. Notice also that I am telling the browser to use the downloaded document as if it were a Microsoft Excel document by setting the CFContent type to "application/msexcel".
Now, granted, this is a bit more involved than the HTML version, but as you can see, once you understand the layout, it's not very complicated. What IS complicated is debugging. If you do get something wrong, the document just doesn't render. It says that it dumps information into a Log, but I cannot find the Log at the location which it reports to have used. Oh well.
In the next demo, I will try to add some more cell formatting and make use of some cell formulas.
Download Code Snippet ZIP File
Comments (21) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Ask Ben: Creating An Archive Folder / File List
Flex Renamer - Most Awesome Bulk File / Folder Regular Expression Renamer Ever
Ben,
that's nice. Thanks! I've done a similar thing with C#. The one thing to note when using XML to generate Excel sheets is that the user has to have at least Office/Excel 2003 to be able to read the Excel XML files. Apparently previous versions of office did not support XML Excel sheets.
Posted by Boyan on Jan 8, 2007 at 8:48 AM
Boyan,
That is true, although I want to say there is a version of Excel from 2002 that will support this (but I might just be making that up). I figure though, that it has been 4 years... time to upgrade if you haven't already. And honestly, if you are doing this for a client, most clients are either willing to upgrade to a new Office (if that is required for the project) or already have it due to the corporate environment. But then again, I have pretty flexible clients most of the time, so that might not be a good sample of the population.
Also, I am not sure how this works in Open Office.
What I do know is that ColdFusion can read in this Microsoft Excel data using XmlParse() and convert it to an internal XML document object. That is way cool and is perhaps a huge selling point over the HTML table version.
Posted by Ben Nadel on Jan 8, 2007 at 9:06 AM
OpenOffice uses the Open Document Format as it's native document format. OASIS has recently adopted it as an XML standard for representing office-style documents, and rumor has it that the next version of Microsoft Office will include support for Open Document, there is already a Microsoft-sponsored effort to build a ODF->OfficeXML translator.
I haven't tried it yet, but there's a bunch of XSL files at their sourceforge site that might be possible to use with ColdFusion to translate ODF -> OfficeXML and back again.
Open Doc Format: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office#technical
ODF Translator: http://sourceforge.net/projects/odf-converter/
Posted by Adam Ness on Jan 8, 2007 at 9:49 AM
In a previous gig, we had very picky requirements for some financial reports for a government agency in the UK (alas, I was never sent over there like so many of my coworkers). The formatting, tabs, and formulas were a pain in the butt to create dynamically, so I created the files in Excel, saved them as XML files, and used a ColdFusion script to parse and replaced placeholders in the files with transaction data from the database. The results were saved in another directory, and voila! A perfectly formatted report with all worksheets, formatting, and formulas intact.
While minor tweaks to the excel template could be made without getting IT involved, the big drawback was scalability. As the reports became larger and larger (certain sections repeated and/or grouped, depending on how many financial transactions were involved), it became a very unwieldy process. In hindsight, it was not a good solution to that particular problem, but it might be useful elsewhere. Live and learn.
Posted by Jeremy on Jan 8, 2007 at 10:28 AM
I am hoping you will help me out. I am trying to get this to work in 6.1 and I can not seem to figure it out. since I don't have the variable attribute of the cfcontent tag; I am trying to find a work around. I have been playing with variations of something like this:
<cfset the_file = ToBinary(ToBase64(strXmlData.Trim().ReplaceAll( '>\s+', '>' ).ReplaceAll( '\s+<', '<' ) ) ) />
<cffile action="write" file="#expandPath('.')#/basic.xml" output="#the_file#" />
<cfheader name="content-disposition" value="attachment; filename=basic.xml" />
<cfcontent type="application/vnd.ms-excel" file="#expandPath('basic.xml')#" />
for the output, but I am only getting text in IE and and an Excel error in FF on 'Table"
Could you steer me in the right direction?
Thank you
Posted by Ron Gowen on Jan 8, 2007 at 12:06 PM
Ron,
You are very close. Since you are writing to a text file (XML file) you don't need to convert to binary. I only had to do that because the Variable attribute only accepts binary. For your thing, you can put the line:
strXmlData.Trim().ReplaceAll( '>\s+', '>' ).ReplaceAll( '\s+<', '<' )
directly into the output of the CFFile tag:
<cffile action="WRITE"
output="#strXmlData.Trim().ReplaceAll( '>\s+', '>' ).ReplaceAll( '\s+<', '<' )#" ...... />
Give that a go and it should work fine. Let me know if you have any other problems.
Posted by Ben Nadel on Jan 8, 2007 at 1:01 PM
Jeremy,
It sounds like you were creating very complicated documents. I am not sure, though, how things would not scale well. Granted I just found out about this XML stuff, but it seems that as long as you have set formats for the documents, scaling (ie. number of records) wouldn't be an issue?
Posted by Ben Nadel on Jan 8, 2007 at 1:02 PM
Yes, they were fairly complicated documents. The scaling issues came into play with the repeatable sections of the spreadsheets. There were placeholder markers for the beginning and end of the repeatable sections. Within these sections were placeholders for the data from the database. How many times these sections were repeated depended on the number and types of transactions. A few thousand transactions a day was fine, but once we got into a few hundred thousand per day, things got a little slow. I later went back and optimized the code significantly, which helped a bit. Granted, with the sheer volume of data being processed daily, I have a feeling that generating those spreadsheets would have been a bear any way you cut it.
Posted by Jeremy on Jan 8, 2007 at 1:26 PM
Most definitely a bear... and when the Row Count and the Column Count have to match up exactly, I am sure it was a nightmare dealing with dynamic / repeatable content areas. Sounds horrible to me :)
For simple documents, though, I am gonna give it a shot and see where it leads me.
Thanks for all of your feedback!
Posted by Ben Nadel on Jan 8, 2007 at 1:28 PM
Ben-
My coworker Matt has been using this XML method to create some pretty sick looking Excel workbooks. I mentioned that you couldn't find the error log file and he said "oh, it's here:"
C:\Documents and Settings\{username}\Local Settings\Temporary Internet Files\Content.MSO
Just sub in your username and that'll do
Posted by Steve Rittler on Jan 8, 2007 at 4:07 PM
Steve,
I appreciate that, but that's the same thing my Excel tells me ... and I still can't locate it :(
OH CRAP! It's a hidden folder. I just found it. Thanks! Stupid windows trying to protect me from myself :) Thanks!
Hey, the debugging information in this ain't half bad. Sweet.
Posted by Ben Nadel on Jan 8, 2007 at 4:12 PM
Any idea on how to add an image to a cell w/ schemaML? Attempted to add an image using the following html namespace: xmlns:html="http://www.w3.org/TR/REC-html40
(image source pointed to an absolute path).
The xml generated by my program looks something like this:
<Cell>
<Data ss:Type="String">
this is a test
<html:Img Src="https://dev-server/test/icon.gif"/>
</Data>
</Cell>
The excel file doesn't produce any errors and opens up successfully. However that line referring to the image is completely ignored by the excel file (renders a blank cell or in this case only the test string). Just to test other html tags in html namespace, applied the html font tag to change the font style in a cell and it rendered that properly to the excel file.
(Also tried converting the image to a binary object but not sure how to write it so it would be recognized by the excel file ...)
Any ideas?
Posted by mo on Feb 15, 2007 at 2:44 PM
Mo,
I have not tried this myself. You should create an XLS and put in am image. Then save that file (via Excel) as an XML document and open it up. My guess is that the graphic is stored as some sort of Base64 encoding (looks like about a 100 lines of random letters).
Posted by Ben Nadel on Feb 15, 2007 at 4:54 PM
It should be noted that this will not work unless you are using Excel 2002 or later. That is when Microsoft added XML support to Excel. While on the face it sounds irrelevant, consider that most Federal Government agencies are still using Windows 2000 and Office 2000 on a wide-scale basis. That's the struggle I had to overcome after being initially thrilled about the prospects.
All that being said, this has a ton of potential. One of the pitfalls of using HTML tables to create an Excel file is that the resulting Excel file usually looks plain, if not trashy. There also is no support for multiple tabs and Excel formulas. This XML-based approach changes all that. It allows us now to have the client design a spreadsheet exactly how they want it, look and feel, formulas, etc. and with a quick Save As, we have the template to build our dynamic Excel file from. A couple of hours of XML cleanup and proper placement of variable placeholders and we can give them exactly what they want, with no compromises.
Thanks for this tip!
Posted by Rob Barthle on Apr 5, 2007 at 10:32 AM
I don't know if Mo has resolved his issue with embedding images in the XML file. I have the same problem. I tried inserting an image to an Excel file, saved it as .xml and checked to see what the resulting XML code is, but there is no indication about the image.
I like the XML schema for Excel coz it allows you to have much of the functionality of Excel, however, I've been stumped by my inability to embed images to the generated Excel file.
Posted by ac on Jul 13, 2007 at 3:22 PM
@AC,
The image might be stored in base64, which would look like a few thousand characters in a row (usually broken onto a new line every X number of characters)... see anything like that in the XML?
Posted by Ben Nadel on Jul 13, 2007 at 3:28 PM
Ben,
Great Snipplets man. But I've converted the QueryNew( line to an actualy cfquery, wrapped the <cfset for the row data in a cfoutput, cfloop to set all the parameters, and esentially that is all that has changed,
and I get the following Error when I try to open the xls in excel:
Table.
I opened the error log and it displays this:
XML ERROR in Table
REASON: Bad Value
FILE: C:\DOCUME~1\jharvey\LOCALS~1\Temp\twc_basic.xml-1.XLS
GROUP: Cell
TAG: Data
VALUE: TXML PARSE ERROR: Missing end-tag
Error occurs at or below this element stack:
<ss:Workbook>
<ss:Worksheet>
<ss:Table>
<ss:Row>
<ss:Cell>
<ss:Data>
<ss:kkg>
Did I do something wrong?
Posted by James on Aug 4, 2008 at 4:25 PM
Just thought that before you asked here's my code:
<cfquery name="quiz_results" datasource="#datasrc#">
select id, user_name, timestamp, q1, q2, q3 from webos_bmi_quiz
</cfquery>
<cfset add_row ='#quiz_results.recordcount#'>
<!--- Add rows to query. --->
<cfset QueryAddRow( quiz_results, #quiz_results.recordcount# ) />
<!--- Set row data. --->
<cfoutput>
<cfloop query="quiz_results">
<cfset quiz_results[ "id" ][ 1 ] = JavaCast( "string", "#id#" ) />
<cfset quiz_results[ "user_name" ][ 1 ] = JavaCast( "string", "#user_name#" ) />
<cfset quiz_results[ "timestamp" ][ 1 ] = JavaCast( "string", "#timestamp#" ) />
<cfset quiz_results[ "q1" ][ 1 ] = JavaCast( "string", "#q1#" ) />
<cfset quiz_results[ "q2" ][ 1 ] = JavaCast( "string", "#q2#" ) />
<cfset quiz_results[ "q3" ][ 1 ] = JavaCast( "string", "#q3#" ) />
</cfloop>
</cfoutput>
<Worksheet ss:Name="Quiz Results">
<Table ss:ExpandedColumnCount="#ListLen( quiz_results.ColumnList )#" ss:ExpandedRowCount="#(quiz_results.RecordCount + 1)#" x:FullColumns="1" x:FullRows="1">
<Column ss:Index="1" ss:Width="30" />
<Column ss:Index="2" ss:Width="100" />
<Column ss:Index="3" ss:Width="42" />
<Column ss:Index="4" ss:Width="84" />
<Column ss:Index="5" ss:Width="66" />
<Column ss:Index="6" ss:Width="70" />
<Row>
<Cell>
<Data ss:Type="String">ID</Data>
</Cell>
<Cell>
<Data ss:Type="String">User Name</Data>
</Cell>
<Cell>
<Data ss:Type="String">Date Taken</Data>
</Cell>
<Cell>
<Data ss:Type="String">Q-1 Results</Data>
</Cell>
<Cell>
<Data ss:Type="String">Q-2 Results</Data>
</Cell>
<Cell>
<Data ss:Type="String">Q-3 Results</Data>
</Cell>
</Row>
<cfloop query="quiz_results">
<Row>
<Cell>
<Data ss:Type="Number">#quiz_results.id#</Data>
</Cell>
<Cell>
<Data ss:Type="String">#quiz_results.user_name#</Data>
</Cell>
<Cell ss:StyleID="ShortDate">
<Data ss:Type="DateTime">#DateFormat( quiz_results.timestamp, "yyyy-mm-dd" )#T#TimeFormat( quiz_results.timestamp, "HH:mm:ss.l" )#</Data>
</Cell>
<Cell>
<Data ss:Type="String">#quiz_results.q1#</Data>
</Cell>
<Cell>
<Data ss:Type="String">#quiz_results.q2#</Data>
</Cell>
<Cell>
<Data ss:Type="String">#quiz_results.q3#</Data>
</Cell>
</Row>
</cfloop>
</Table>
</Worksheet>
</Workbook>
</cfoutput>
</cfsavecontent>
Posted by James on Aug 4, 2008 at 4:28 PM
Nevermind,
One of the Usernames inputted in the database had < & > in it (apparently it was inputted before I added my stripHTML command on the entry form.
Sorry for any issues, or hours of scratching one's head.
-James
Posted by James on Aug 5, 2008 at 3:36 PM
Please disregard last post, Error is still occuring.
Any thoughts?
Posted by James on Aug 5, 2008 at 4:33 PM
@James,
Did the XmlFormat() suggestion help at all?
Posted by Ben Nadel on Aug 7, 2008 at 2:08 PM