Reading Excel Files With ColdFusion And POI

Posted January 14, 2007 at 3:08 PM

Tags: ColdFusion

I have been doing a lot of experimenting with writing Microsoft Excel files using ColdFusion. I was just recently turned onto POI by Rob Brooks-Bilson and found some great code resources by Dave Ross. However, when it comes to reading in an Excel file using POI and ColdFusion, I was not able to find much information. So, using Dave's POI reference page and some Java examples on OnJava.com I put this little example together.

There are probably lots of pitfalls and stuff that I am not seeing (this is my first read with ColdFusion and POI) so m code makes the following assumptions with this example:

  1. The data is in a uniform block of cells, meaning, all rows have the same number of column.
  2. The data only has one sheet (Excel tab) that I am trying to get at.
  3. There is no header row.
  4. There are no "bad cells" - cells that have no data but the files thinks they are part of the "active cell block."

While the assumptions above might seem a bit limiting, I think that they will cover about 95% of the cases that I will have to deal with. So, here is the code that I have come up with:

 Launch code in new window » Download code as text file »

  • <!---
  • Create the Excel file system object. This object is
  • responsible for reading in the given Excel file.
  • --->
  • <cfset objExcelFileSystem = CreateObject(
  • "java",
  • "org.apache.poi.poifs.filesystem.POIFSFileSystem"
  • ).Init(
  •  
  • CreateObject(
  • "java",
  • "java.io.FileInputStream"
  • ).Init(
  •  
  • ExpandPath( "./jenna_jameson.xls" )
  •  
  • )
  • ) />
  •  
  •  
  • <!---
  • Get the workbook from the Excel file system object that
  • we just created. Remember, the workbook contains the
  • Excel sheets that have our data.
  • --->
  • <cfset objWorkBook = CreateObject(
  • "java",
  • "org.apache.poi.hssf.usermodel.HSSFWorkbook"
  • ).Init(
  • objExcelFileSystem
  • ) />
  •  
  • <!---
  • For this demo, we are only interested in reading in the
  • data from the first sheet. Remember, since Java is zero-
  • based, not one-based like ColdFusion, the first Excel
  • sheet is at index ZERO (not ONE).
  • --->
  • <cfset objSheet = objWorkBook.GetSheetAt(
  • JavaCast( "int", 0 )
  • ) />
  •  
  •  
  • <!---
  • We are going to build a ColdFusion query that houses the
  • Excel data, but we don't know anything about the data
  • just yet. So, just create the place holder for the query
  • and then we will add to it when we have more information.
  • --->
  • <cfset qCell = "" />
  •  
  •  
  • <!---
  • Get the Excel sheet's row iterator. This appears to be some
  • sort of implementation of the Java class java.util.TreeMap,
  • but I don't know much about that. What I do know, is that
  • this will allow us to loop over the rows in the Excel file
  • until there are no more to loop over. The interface for it
  • looks like the standard iterator interface.
  • --->
  • <cfset objRowIterator = objSheet.rowIterator() />
  •  
  •  
  • <!---
  • User the row iterator to loop over all the physical rows in
  • the Excel sheet. This condition checks to see if we have a
  • row to read in. At this point, the iterator is NOT pointing
  • at a valid Excel data row.
  • --->
  • <cfloop condition="objRowIterator.HasNext()">
  •  
  • <!---
  • We have determined that we have a valid row to read.
  • Now, move the iterator to point to this valid row.
  • --->
  • <cfset objRow = objRowIterator.Next() />
  •  
  • <!---
  • Get the number of physical cells in this row. While I
  • think that this can possibly change from row to row,
  • for the purposes of this demo, I am going to assume
  • that all rows are uniform and that this row is a model
  • of how the rest of the data will be displayed.
  • --->
  • <cfset intCellCount = objRow.GetPhysicalNumberOfCells() />
  •  
  •  
  • <!---
  • Check to see if the query variable we have it actually
  • a query. If we have not done anything to it yet, then
  • it should still just be a string value (Yahoo for
  • dynamic typing!!!). If that is the case, then let's use
  • this first data row to set up the query object.
  • --->
  • <cfif NOT IsQuery( qCell )>
  •  
  • <!---
  • Create an empty query. Doing it this way creates a
  • query with neither column nor row values.
  • --->
  • <cfset qCell = QueryNew( "" ) />
  •  
  • <!---
  • Now that we have an empty query, we are going to
  • loop over the cells COUNT for this data row and for
  • each cell, we are going to create a query column
  • of type VARCHAR. I understand that cells are going
  • to have different data types, but I am chosing to
  • store everything as a string to make it easier.
  • --->
  • <cfloop
  • index="intCell"
  • from="0"
  • to="#(intCellCount - 1)#"
  • step="1">
  •  
  • <!---
  • Add the column. Notice that the name of the
  • column is the text "column" plus the column
  • index. I am starting my column indexes at ONE
  • rather than ZERO to get it back into a more
  • ColdFusion standard notation.
  • --->
  • <cfset QueryAddColumn(
  • qCell,
  • "column#(intCell + 1)#",
  • "CF_SQL_VARCHAR",
  • ArrayNew( 1 )
  • ) />
  •  
  • </cfloop>
  •  
  • </cfif>
  •  
  •  
  • <!---
  • ASSERT: Whether we are on our first Excel data row or
  • our Nth data row, at this point, we have a ColdFusion
  • query object that has the proper columns defined.
  • --->
  •  
  • <!---
  • Add a row to the query so that we can store this row's
  • data values.
  • --->
  • <cfset QueryAddRow( qCell ) />
  •  
  •  
  • <!--- Loop over the cells in this row to find values. --->
  • <cfloop
  • index="intCell"
  • from="0"
  • to="#(intCellCount - 1)#"
  • step="1">
  •  
  •  
  • <!---
  • When getting the value of a cell, it is important
  • to know what type of cell value we are dealing
  • with. If you try to grab the wrong value type,
  • an error might be thrown. For that reason, we must
  • check to see what type of cell we are working with.
  • These are the cell types and they are constants
  • of the cell object itself:
  •  
  • 0 - CELL_TYPE_NUMERIC
  • 1 - CELL_TYPE_STRING
  • 2 - CELL_TYPE_FORMULA
  • 3 - CELL_TYPE_BLANK
  • 4 - CELL_TYPE_BOOLEAN
  • 5 - CELL_TYPE_ERROR
  • --->
  •  
  • <!--- Get the cell from the row object. --->
  • <cfset objCell = objRow.GetCell(
  • JavaCast( "int", intCell )
  • ) />
  •  
  • <!--- Get the type of data in this cell. --->
  • <cfset objCellType = objCell.GetCellType() />
  •  
  • <!---
  • Get teh value of the cell based on the data type.
  • The thing to worry about here is cell forumlas and
  • cell dates. Formulas can be strange and dates are
  • stored as numeric types. For this demo, I am not
  • going to worry about that at all. I will just grab
  • dates as floats and formulas I will try to grab as
  • numeric values.
  • --->
  • <cfif (objCellType EQ objCell.CELL_TYPE_NUMERIC)>
  •  
  • <!---
  • Get numeric cell data. This could be a
  • standard number, could also be a date value.
  • I am going to leave it up to the calling
  • program to decide.
  • --->
  • <cfset objCellValue = objCell.GetNumericCellValue() />
  •  
  • <cfelseif (objCellType EQ objCell.CELL_TYPE_STRING)>
  •  
  • <cfset objCellValue = objCell.GetStringCellValue() />
  •  
  • <cfelseif (objCellType EQ objCell.CELL_TYPE_FORMULA)>
  •  
  • <!---
  • Since most forumlas deal with numbers, I am
  • going to try to grab the value as a number. If
  • that throws an error, I will just grab it as a
  • string value.
  • --->
  • <cftry>
  • <cfset objCellValue = objCell.GetNumericCellValue() />
  •  
  • <cfcatch>
  •  
  • <!---
  • The numeric grab failed. Try to get the
  • value as a string. If this fails, just
  • force the empty string.
  • --->
  • <cftry>
  • <cfset objCellValue = objCell.GetStringCellValue() />
  •  
  • <cfcatch>
  •  
  • <!--- Force empty string. --->
  • <cfset objCellValue = "" />
  •  
  • </cfcatch>
  • </cftry>
  •  
  • </cfcatch>
  • </cftry>
  •  
  • <cfelseif (objCellType EQ objCell.CELL_TYPE_BLANK)>
  •  
  • <cfset objCellValue = "" />
  •  
  • <cfelseif (objCellType EQ objCell.CELL_TYPE_BOOLEAN)>
  •  
  • <cfset objCellValue = objCell.GetBooleanCellValue() />
  •  
  • <cfelse>
  •  
  • <!--- If all else fails, get empty string. --->
  • <cfset objCellValue = "" />
  •  
  • </cfif>
  •  
  •  
  • <!---
  • ASSERT: At this point, we either got the cell value
  • out of the Excel data cell or we have thrown an
  • error or didn't get a matching type and just
  • have the empty string by default. No matter what,
  • the object objCellValue is defined and has some
  • sort of SIMPLE ColdFusion value in it.
  • --->
  •  
  •  
  • <!---
  • Now that we have a value, store it as a string in
  • the ColdFusion query object. Remember again that my
  • query names are ONE based for ColdFusion standards.
  • That is why I am adding 1 to the cell index.
  • --->
  • <cfset qCell[ "column#(intCell + 1)#" ][ qCell.RecordCount ] = JavaCast( "string", objCellValue ) />
  •  
  •  
  • </cfloop>
  •  
  • </cfloop>
  •  
  •  
  • <!---
  • At this point, the excel data should be in a ColdFusion
  • query object. However, if the query did not contain any
  • record, then the row iterator was never launched which
  • mean we never actually defined a query. As one final check
  • just make sure we are dealing with a query.
  • --->
  • <cfif NOT IsQuery( qCell )>
  •  
  • <!--- Just define an empty query. --->
  • <cfset qCell = QueryNew( "" ) />
  •  
  • </cfif>

Taking this Excel file:


 
 
 

 
Reading Excel Files Using ColdFusion And POI  
 
 
 

... and then CFDump'ing it out, we get:


 
 
 

 
Reading Excel Files Into A ColdFusion Query Using POI  
 
 
 

This is pretty exciting! I am really liking this POI library. Rob Brooks-Bilson tells me that the version of POI that ships with ColdFusion is a slightly older one. I can only hope that the upcoming versions of ColdFusion continue to ship with Microsoft Excel solutions.

POI aside, I am still hoping to get a JExcel sample up and running. Mark Mandel has been nice enough to help me out with that, but I just haven't had time to go back and implement his advice. As you probably notice, I have about a 1000 things going on in my head and it just pours out when ever I can get a few minutes here and there.

Download Code Snippet ZIP File

Comments (22)  |  Post Comment  |  Ask Ben  |  Permalink  |  Other Searches  |  Print Page




Adobe ColdFusion 8.0.1 Update - Helping Programmers To Be Signifanctly Less Girlie - Download ColdFusion 8 Update 8.0.1 Now.

Reader Comments

Now if you could implement this as a full blown CFC. I would love you forever. please?? :)

Posted by Derek Perez on Jan 14, 2007 at 4:25 PM


I played around with this a few weeks ago, as well.
I didn't really get that far into it, because of paying work that needed to get done, but I created a small ColdFusion function that takes as an input a path to an excel file and the number of columns to pull in each worksheet.
It returns a Structure of CF Query objects containing the data from the excel spreadsheet. I didn't go into the detail you have with the different cell types, and your work is a bit more elegant, but combining your work with the structure aspect from my function would yield a powerful data import tool.
You can see my [ uncommented :( ] function here: http://ken.auenson.com/examples/func_ExcelToQueryStruct_POI_example.txt

Posted by Ken Auenson, II on Jan 14, 2007 at 5:09 PM


Ben:

Can you clarify that that POI library you are using is not the one that comes with CF? If so, I'm assuming you had to load this library into CF's class path.

For my work sites, I have access to the server. But for my personal site, I'm hosted by a web hosting company that doesn't allow us to add Java packages.

Bruce

Posted by Bruce on Jan 14, 2007 at 7:08 PM


@Ken,

That stuff looks good. I like the way that you are looping over the sheets and returning them in an array. I have some ideas that could perhaps improve up on it (such as returning an array of sheets structures, each that would have keys for things like "Name", "Query", "Index"... or something). I will try to merge our two ideas together and make a wicked awesome love-child.

@Bruce,

As far as I know, this IS the POI package that ships with ColdFusion MX 7. Other people have been working with JExcel, which I know has to be installed. No one here (at my office) knew about POI until Rob pointed it out to me... meaning, we have not installed anything extra as far as I know. When in doubt, try to run this line:

<cfset CreateObject( "java", "java.lang.StringBuffer" ).Init() />

If that can run without error then you know two things:

1. CreateObject() works
2. You can directly create installed Java packages.

If you can do that, then try to cut-n-paste the demo above. I assume that as long as you are running on a recent enough edition of ColdFusion then you should be good to go.

But again, this is my first time ever reading via POI and ColdFusion, so I am still quite wet behind the ears.

Posted by Ben Nadel on Jan 14, 2007 at 8:08 PM


@Bruce:
My function was using the built-in POI library as well.

@Ben:
That sounds like a great idea.
The only piece I am not happy with right now is that I can't seem to see a nice way to determine the number of columns dynamically. Maybe what would be required would be to loop over every row in a given sheet one time before doing the real work to determine the max columns on that sheet. That would iliminate my solution of requiring a column input, and your solution of retrieving the first row's number of columns.
This could turn out to be a really useful tool! I wish I had more time to spend on it. Good luck, and keep us posted!

Posted by Ken Auenson, II on Jan 14, 2007 at 8:23 PM


A long time ago (back in CF5), I remember using COM objects to pull data out of an excel file. I was very tedious and SLOWWWW. Don't know how fast this thing is under load, but the COM objects I used sucked. I think really it had to do with a CF5 bug, but can't really remember. Anywho, the way I got around it was to use COM to basically open up the excel file and then save it to a TAB delimited file. Then I could just use go old CF to read the TAB delimited file and throw it into a query.

Posted by Tony Petruzzi on Jan 14, 2007 at 9:45 PM


Tony,

That's a pretty clever idea regarding the TAB file saving. I have just started working with POI so I cannot testify as to the speed of it... but I hear very good things. Hopefully, I will know more soon.

Posted by Ben Nadel on Jan 15, 2007 at 7:48 AM


Ken,

I was just trying to put some code down on paper, doing a what you suggested (going through the sheet once to find the number of columns) when it dawned on me... who care?!?

This is what occurred to me: what ever part of the application uses the query that comes out of the Excel sheet is going to know the format of the excel sheet. Meaning, they are going to assume column 1 is last name and column 2 is first name (for example). The structure of the Excel is set for the calling app, not dynamic.

Because the structure is set, no one is every going to treat the query structure as dynamic. This makes extra columns irrelevant. If there is an extra 6th column passed to a part that expects 5 column, it doesn't matter because most likely that 6th column will never be reference.

I am going to go an take out the logic I put in for checking the column count as I think it is going to add more overhead and complicate things more than is worth it.

Posted by Ben Nadel on Jan 15, 2007 at 8:49 AM


Ben:
Good point. I didn't actually think of the use cases before writing my last comment, but your right. anyone using this code will already know the structure of the spreadsheet.

Posted by Ken Auenson, II on Jan 15, 2007 at 9:33 AM


Ken,

I was thinking the same exact thing you were. I even played around with the excel, adding and clearing cells vs. "Delete"ing a column altogether. But then as I was writing the code it just didn't seem that useful (and was making my code more complicated than it needed to be).

I am done writing the "Reading" part of a small CFC, POIUtility.cfc. I just want to add the Write (query to excel) and then I will post up. Hopefully I can do this at lunch.

I am borrowing for you ideas and adding to them. More to come soon.

Posted by Ben Nadel on Jan 15, 2007 at 9:52 AM


Let me know when you get the POIUtility.cfc written. I would like to take a look at the code.

Posted by Matthew on Jan 15, 2007 at 4:20 PM


Matt,

It is written. I am just writing up the entry and example. I will try to finish it right after work. If not then, first thing in the morning.

Posted by Ben Nadel on Jan 15, 2007 at 4:34 PM


If I understand this correctly, this is for apache. Is there a way to do it with IIS?

Posted by CJ on Jan 15, 2007 at 4:36 PM


CJ,

This is for ColdFusion. I don't think it has anything to do with the server (but I am NOT a server guy, so this may be misinformed). I am working with Windows Server and IIS and all is going swimingly.

Posted by Ben Nadel on Jan 15, 2007 at 4:37 PM


Thanks Ben,
I will give it a try and see if I can get it to work

Posted by CJ on Jan 15, 2007 at 4:51 PM


@CJ

I use POI on both IIS and Apache at work, so it should work regardless of which web server you choose.

Posted by Matthew on Jan 15, 2007 at 5:17 PM


Ken, Matt,

Just posted some code:

http://www.bennadel.com/index.cfm?dax=blog:474.view

Gotta go home now, I am STARVING!

Posted by Ben Nadel on Jan 15, 2007 at 7:00 PM


Ben, I'll recommend you to know about JExcel Java libraries. This is a great great piece of code. The ntegration with CF is very easy. I'm using JExcel(http://jexcelapi.sourceforge.net/) and I'm very happy with your power. Please look this simple example:
<cfset inicio=GetTickCount()>

<!--- armazenar a biblioteca jxl.jar no diretorio c:\cfusionmx7\lib e restartar o servidor --->

<cfset filepath = "C:\Manufact\200701_Fat.xls">
<cfscript>
//carrega o metodo Java que manuseia o arquivo xls
wfile = createObject("java","java.io.File").init(filepath);

//cria o workbook e carrega a planilha
workbook = createObject("java","jxl.Workbook");

//pega o workbook
wworkbook = createObject("java","jxl.write.WritableWorkbook");
wworkbook = workbook.getWorkbook(wfile);

sheet = createObject("java","jxl.write.WritableSheet");
//pega uma planilha especifica
sheet = wworkbook.getSheet("FaturamentoMaior");
</cfscript>

<cfset rowslen = sheet.getRows()>

<cfset columnslen = sheet.getColumns()>

<!--- captura o texto das colunas da planilha --->
<cfoutput>
<cfloop from="0" to="#columnslen-1#" index="c">
#sheet.getCell(javacast('int',c),0).getContents()#
<br />
</cfloop>
</cfoutput>

<cfoutput>
<cfloop from="0" to="#rowslen-1#" index="q">
#sheet.getCell(0,javacast('int',q)).getContents()# | #sheet.getCell(1,javacast('int',q)).getContents()# | #sheet.getCell(2,javacast('int',q)).getContents()#
<br />
</cfloop>
</cfoutput>
Tempo de execução: <cfdump var="#GetTickCount()-inicio#">ms

Posted by Marco Antonio C Santos on Jan 17, 2007 at 11:03 AM


Marco,

I have fooled around a bit with JExcel. It is very cool but once I found out that POI was already delivered with ColdFusion MX 7 I kind of took a detour. When I get a better footing on POI, I will surely come back and continue to look at JExcel. Thanks for you example above.

Keep on rockin!

Posted by Ben Nadel on Jan 17, 2007 at 1:45 PM


There's a rudimentary google code project called cfjexcel, that is an attempt at making using JExcelAPI easier.

You do need access to createObject, but it's got some basic row-level formula embedding, which users always love. Might be at least a good place for ideas.

Here's a demo of the formula stuff, and some basic IO:

http://coldshen.com/cfjexcel/test.cfm

Posted by denny on Mar 13, 2007 at 1:42 AM


Your code is littered with magic numbers and non-descript variable names, and is absolutely overkilled with comments. Quality code shouldn't need nearly so many comments. It should be self-explanatory.

Posted by AJ on May 22, 2008 at 3:29 PM


@AJ,

I am always looking for ways to improve my code. Could you give me an example of the "magic numbers" and non-descript variables names and maybe some suggestions? Thanks!

Posted by Ben Nadel on May 22, 2008 at 3:41 PM


Post Comment  |  Ask Ben


Home   |   Web Log   |   ColdFusion   |   Projects   |   Resume   |   Job Form   |   Search   |   Contact
Epicenter Consulting - Custom Software Solutions for Business Evolution HostMySite.com - The Leader In ColdFusion Hosting