Reading A Microsoft Excel File In ColdFusion Using JExcel API
I just started playing around with the JExcel Java API created by JNIWrapper. The first thing I have done is just read in the data from an Excel file. This is pretty neat stuff. But, before I get into it, let me just touch upon the JExcel Java files. When you download the JExcel Java files, they come as a JAR file. From what I have been told, this is basically a "zipped" file of Java classes. Now, for those of you who follow my blog you know that I LOVE portability and I hate going into the ColdFusion Admin. So, this example is done using a URL Class Loader and a local JAR file that is meant for use with this example only. If you want to see how the Url Class Loader works, it is based on the JavaLoader.cfc by Mark Mandel and you can see my code here. The bottom line here is that this can be done even if you don't have access to the ColdFusion Admin or the ability to update ColdFusion's class paths.
Ok, so onto the example. I have a simple excel file that has a few columns and rows of data. This file is known as the Workbook. The workbook contains sheets (those tabs at the bottom of your excel file). This example reads each sheet into its own query object and each of those query objects is stored in a one-dimension array which will represent the workbook as a whole.
Let's read us in some Excel data:
<!--- Create the uninitialized Workbook class. Remember, this method CreateJExcelObject() just uses a Class Loader to load the Workbook class from the locally stored JAR file. This method automatically prepends "jxl." to all class calls, so do NOT include it in this method call. ---> <cfset objWorkbook = CreateJExcelObject( "Workbook" ) /> <!--- Get the workbook from the given file. This workbook may contain multiple sheets. ---> <cfset objWorkbook = objWorkbook.GetWorkbook( CreateObject( "java", "java.io.File" ).Init( ExpandPath( "./test.xls" ) ) ) /> <!--- Create an object to keep track of all the sheet / cell data. We are, essentially going to store the excell sheets in an array and the data in a query. ---> <cfset arrWorkbook = ArrayNew( 1 ) /> <!--- Loop over the number of sheets in the workbook. We need to get the number of sheets in the workbook. Be careful! The GetSheets() method doesn't return a number (like the later used GetColumns() or GetRows()); instead, it returns an array of actual jxl.Sheet objects. ---> <cfloop index="intSheet" from="1" to="#ArrayLen( objWorkbook.GetSheets() )#" step="1"> <!--- Create a blank query for this excel data sheet. We could create the number of columns right now, but I think it will be easier to add columns as we go. ---> <cfset qData = QueryNew( "" ) /> <!--- Store this data query into the workbook array. ---> <cfset ArrayAppend( arrWorkbook, qData ) /> <!--- Get the current sheet. Remember that since we are getting the sheet by index and we are getting it through a JAVA method, we need to use zero-based indexes (unlike ColdFusion which is one-based). ---> <cfset objSheet = objWorkbook.GetSheet( JavaCast( "int", (intSheet - 1) ) ) /> <!--- Loop over the columns and rows. As we loop over the columns, we are going to add a column to the data query and then add the row values. This should allow us to easily create the computed names such as COLUMN1, COLUMN2, COLUMN3. ---> <cfloop index="intColumn" from="1" to="#objSheet.GetColumns()#" step="1"> <!--- Add the column to the query. We are going to add all the columns as text since I have no idea what kind of data there will be. We can always parse it into a data type later. ---> <cfset QueryAddColumn( qData, ("COLUMN" & intColumn), "CF_SQL_VARCHAR", ArrayNew( 1 ) ) /> <!--- We need to add rows. However, since we might be adding a lot of data, let's resize the query for faster processing. We ONLY want to do this the FIRST time. Then, the rest of the times, we can simply add data to the existing query cells. ---> <cfif (intColumn EQ 1)> <!--- Resize the query based on the number of rows in the excel file. ---> <cfset QueryAddRow( qData, objSheet.GetRows() ) /> </cfif> <!--- Now that we have the appropriate column added to the query and we have the correct number of rows that we will need, let's loop over the rows and move the cell data into the data query. ---> <cfloop index="intRow" from="1" to="#objSheet.GetRows()#" step="1"> <!--- Set the query data. ---> <cfset qData[ "COLUMN#intColumn#" ][ intRow ] = JavaCast( "string", objSheet.GetCell( JavaCast( "int", intColumn - 1 ), JavaCast( "int", intRow - 1 ) ).GetContents() ) /> </cfloop> </cfloop> </cfloop> <!--- When we have finished processing all the cells, use the close() method. This frees up any allocated memory used when reading spreadsheets and is particularly important when reading large spreadsheets. ---> <cfset objWorkbook.Close() />
If you dump out the arrWorkbook array, you will see that it has three indexes with the appropriate queries:
Now, this is a SLOW process. This has a minimal amount of data and takes 5 to 6 seconds to load. Yeah, I said SLOW. However, it seems to scale nicely. If I add a good number of columns and records, the parse time only jumped to about 9 seconds. It looks like the overhead comes mostly from actually opening the excel file via Java. Reading the data from the Excel file seems to scale nicely. This is based on LIMITED testing!
Anyway, the JExcel API seems like it could be very cool to do a number of things including writing Excel files. I have just begun to research, but I can already see the possibilities.
Want to use code from this post? Check out the license.
You may also want to check out : boldPOI-HSSF - Java API To Access Microsoft Excel Format Filesbold
It is part of the Apache Jakarta project. You can read and write binary excel files, including formulas, merging cells, formating, multiple worksheets etc. etc.
Thanks for the link. I will check it out. It's funny, though, clicking on your link I read:
"If you're merely reading spreadsheet data, then use the eventmodel api in the org.apache.poi.hssf.eventusermodel package."
That just seems funny. The JExcel stuff is mad simple like "jxl.Workbook". Why is Jakarta all about using crazy long complicated names?? Anyway, I will give it a looking into. Thanks.
WOW, thanks I was looking into purchasing some code that did the exact same thing. I will check this out first.
I can second the POI/HSSF recommendation. I've been using it at work for about 6 months to read and generate native Excel documents. It doesn't seem to have the 5-6 second lag that you mentioned, and it is mostly logical.
fileOut = createObject("java","java.io.FileOutputStream").init(Arguments.Filename);
wb = createObject("java","org.apache.poi.hssf.usermodel.HSSFWorkbook").init();
sheet = wb.createSheet(Arguments.Sheetname);
// Add the column headers
row = sheet.createRow(0);
style = wb.createCellStyle();
font = wb.createFont();
Blah, blah, blah.
I will take a look. Right now, all I can really see myself doing is reading and writing simple files. It looks like the POI stuff has some great formatting issues. Let me get a feel for both and then I can make an educated choice.
The Jakarta one just has crazy class names that I am having trouble dealing with. I mean:
Is there nothing they could have done to make that easier to type? Granted I have very little Java experience, so I have no idea what the language forces you to do, but this just seems silly.
I'd be curious to see if you still get a lag if you put the jar file in your CF classpath, and invoke it directly, without the classloader.
I posted some guides to using HSSF in CF a few years ago:
As for Java class naming - keep in mind that Java has an import statement - which prevents you from having to repeatedly type full package names. We don't have that luxury from CF.
That might be the issue regarding the initial delay. However, once the Java object is loaded, there should be no difference when it comes to reading the actual Excel file itself right?
My co-worker is actually the one who showed me the JExcel stuff (he just started working with it). He uses the mapped path. I will check with him to see about any lag time.
That looks like a really awesome overview of the functionality. Dynamite post! Once I start looking into the Jakarta solution I will definitely go through your stuff with a fine-toothed comb. Thanks!
What version of Excel were you working with? The new MS Office 2007 xml documents are MUCH easier to work with...
I am working with Excel 2003 I think. However, if I had to work with XML, would it require the end user to export the excel document as an XML before they put it in the system (what ever system is there to handle the Excel data)? If so, it still requires an extra step. If I was gonna do that, why not just go to CSV. The hope here was to allow the client to perform actions without thinking.
But i have to say, I am very curious now about this XML you speak of. I have tried to work with the XML that Word 2003 produces, but it was a NIGHTMARE. Is the 2007 stuff much cleaner?
I am trying to read an excel file with your code and I am getting this error
The error occurred in D:\users\ixClients\payquiq\scribble.cfm: line 107
105 : ("jxl." & ARGUMENTS.Class),
106 : JavaCast( "boolean", false ),
107 : LOCAL.ClassLoader
108 : )
109 : />
I was getting that error for a while and was able to fix it. I can't quite remember what it was. I think I was getting that when I was using the Class Loader to load the workbook class. The problem is that it cannot do that since it needs to be able to call the "new" operator.
That is why I switched over to the Java Proxy object that Mark Mandel demonstrated. See this code:
CreateObject( "java", "coldfusion.runtime.java.JavaProxy" ).Init(
LOCAL.ClassLoader.LoadClass( ("jxl." & ARGUMENTS.Class) )
This uses the Java Proxy to take the class loader and then load the class without initializing it.
Coldfusion uses a ClassLoader when it starts up and that is how CF loads up java so why are you using a CFC to do the job? Why not just use the
same Classloader already installed and in use?
To use ColdFusion's ClassLoader, I would have to have the JExcel JAR file installed in ColdFusion's class paths. I however, do not have access to such areas of our setup.
All Classloaders work exactly the same. They all come from one root URLClassloader so it does not matter who's class it is. I showed you and other people on CF Talk working examples but somehow the examples did not take. ColdFusion has a specialized object for URLClassloading and it can be used just like any URLClassloader at runtime without any limits. I asked Adobe to add this as a feature for CF 8 as a new tag so all this maybe moot anyway. My spin is to load the jar via cfhttp to the web root and then load via CF bootstrapClassloader so the network usage only takes one hit. If the jar is found locally hit it, if not hit the network. I will post this message and example on my Coldfusion Java Hacks list today.
Go take a look. I just posted java.io.LineNumberReader example too and Google Talk examples. I am sure you have seen my CF Ghost examples too.
If you have shown me stuff in Java before and they did not take it is only because I am a total Java noob and things don't tend to stick in my head until I fully understand them. I still don't quite understand what you are saying :) but I will definitely look at your examples!
Ben, I'm using JExcel for nearly months. It's a nice tool using low amount of data. Today I found a timeout in Excel generation file for ~9000 rows and ~25 columns. Do you have tests for huge amount of data? JExcel scales for you?
I only did this one test with JExcel. After this, I moved onto the the POI library which comes installed with ColdFusion.