Learning ColdFusion 8: CFZip Part IV - Extracting Zip File Archives

Posted July 24, 2007 at 7:39 AM by Ben Nadel

Tags: ColdFusion

In the last part of this series, we looked at how to list and read the files of a zip archive using ColdFusion 8's new CFZip and CFZipParam tags. Those techniques alone could be used to fully unzip an archive, but certainly, it wouldn't be the best choice. Luckily, ColdFusion 8's CFZip tag provides us with the Unzip action which can unzip an entire archive, or subsections of an archive, with the greatest of ease.

As always, before we can start exploring the unzipping of an archive in ColdFusion 8, we have to create a zip archive to work with. We are going to zip the following directory structure:

./data/documents/manual.txt
./data/documents/readme.txt
./data/images/funny.jpg
./data/images/mud_monster.jpg
./data/images/red_face.jpg
./data/images/smile.jpg

To create the zip, we are going to use the simple ColdFusion 8 CFZip tag:

  • <!---
  • Create a zip archive of the data directory.
  • By default, ColdFusion 8 will recurse the
  • source directory, store the storage paths,
  • and do to our use of Overwrite attribute, we
  • will make sure we create a new zip archive.
  • --->
  • <cfzip
  • action="zip"
  • source="#ExpandPath( './data/' )#"
  • file="#ExpandPath( './data.zip' )#"
  • overwrite="true"
  • />

This will put all contents of the data directory into the root of our zip archive (the documents and images directory). Now that have a zip, the easiest thing we can do is reverse the process by unzipping the entire archive using the unzip action:

  • <!---
  • Unzip the zip archive into the directory
  • named "unzipped". The unzipped directory must
  • exists before we perform this action.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped/' )#"
  • />

Here, we are using the two required attributes that go with the Unzip action. The File attribute is the absolute path to the zip archive we are going to unzip. The Destination attribute is the absolute path to the directory into which we are going to unzip the archive contents. This directory must exist before you try to reference it; if you try to unzip the archive into a non-existent directory, ColdFusion 8 will throw the following error:

The destination G:\....\cf8\zip\unzipped specified in the cfzip tag is invalid. The destination must be a directory and should be accessible by this tag.

By default, ColdFusion 8 will unzip the entire archive, keeping the archive directory structure (entry path structure) as is, and will not overwrite files that already exist in the destination directory. But, by using some of the optional attributes of the CFZip tag, we can change the way things happen.

The StorePath attribute, which defaults to True, is what determines whether or not we keep the entry path structure. If directory structure is not important to us and we want to unzip all the entries directly into the root of the destination folder, all we need to do is set StorePath to false. Running this code:

  • <!---
  • Unzip the zip archive into the directory
  • named "unzipped_flat". Instead of keeping the
  • archive entry path structure, we are going to
  • unzip all of the entries directly into the
  • root of our target directory.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_flat/' )#"
  • storepath="false"
  • />

... will leave us with an unzipped_flat directory that looks like this:

./funny.jpg
./manual.txt
./mud_monster.jpg
./readme.txt
./red_face.jpg
./smile.jpg

When we flatten a zip archive, as we just did above, one of the things we have to be careful of is possible naming conflicts that might be caused by like-named files at different entry paths being merged into the destination root. For example, if we added this text file entry:

./images/readme.txt

... with the content:

This is the IMAGES readme file.

... to the data archive, when flattened, the readme.txt file in the documents folder would be conflict with the readme.txt we just added to the images folder. By default, ColdFusion 8 will not overwrite any existing files in the destination directory, and as such, the documents readme.txt file will be the only readme.txt file that gets unzipped. Since unzipping happens in a depth-first fashion, the images readme.txt file will be examined only after the documents readme.txt file, and since there is already a readme.txt in the root of the destination, it will not be unzipped. ColdFusion 8 will not throw an error over this, it will simply skip the current archive entry.

By setting the Overwrite attribute to True, we can get ColdFusion to overwrite any files that already exist. Therefore, running this code:

  • <!---
  • Unzip the zip archive into the directory
  • named "unzipped_flat". Instead of keeping the
  • archive entry path structure, we are going to
  • unzip all of the entries directly into the
  • root of our target directory. As we merge the
  • entries, we are going to overwrite them, thereby
  • keeping only the last version of all the
  • like-named files we come accross.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_flat/' )#"
  • storepath="false"
  • overwrite="true"
  • />

... the images readme.txt will still be unzipped after the documents readme.txt, but this time, the images readme.txt file will overwrite the one from the documents entry path.

Using the Recurse attribute, which defaults to True, we can get CFZip to only extract a single directory. Running this code:

  • <!---
  • Unzip the zip archive into the directory
  • named "unzipped_root". Instead of recursing
  • through the entire archive, just unzip the
  • root directory.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_root/' )#"
  • recurse="false"
  • />

... will unzip the only the root of the zip archive into the directory, unzipped_root. However, since there are no files in the root of our archive (only our two directories - documents and images), our unzipped_root directory remains empty.

If we don't want to recurse, but we also don't want to unzip the root of the archive, we can use the optional EntryPath attribute to get at a subdirectory of the archive. If we wanted to unzip just the images folder, we run this code:

  • <!---
  • Unzip the archived images folder into the
  • directory named "unzipped_images". By not storing
  • the path of the entry, we will ensure that an
  • "images" folder does not get created.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_images/' )#"
  • entrypath="images"
  • recurse="false"
  • storepath="false"
  • />

This would leave us with an unzipped_images directory that looks like this (we are no longer dealing with the readme.txt in the images directory):

./funny.jpg
./mud_monster.jpg
./red_face.jpg
./smile.jpg

Now, there's actually a bunch of things happening synergistically in the code that we just ran. We turned off directory recursion so that if images had a subdirectory, it would be ignored. We then told CFZip not to store the entry paths. This is make sure we don't end up with an "images" directory inside of our unzipped_images destination directory. By default, since the images are inside of an images archive folder, CFZip wants to create that images folder into which it will unzip the image entries. Then, finally, to make sure we are just unzipping the images archive folder, we use the optional EntryPath attribute to point the action at the images folder.

The EntryPath attribute acts a bit different than it did when we examined it in the context of Reading archive entries. When reading an archive entry , you cannot use the "./" and "/" leading path constructs or ColdFusion 8 will throw an error. When it comes to unzipping an archive, you still cannot use the "./" or "/" leading path constructs. Additionally, you cannot even use the trailing "/" characters. The following EntryPath values are all invalid:

/images/
./images/
images/

The difference, when unzipping an archive, is that ColdFusion 8 will not throw any errors. The above paths will simply not work. In order to properly define a target directory, you must exclude both leading and trailing path slash constructs.

Now, we could have accomplished the same thing by using the optional Filter attribute. As we have covered in almost every other part of this series, the filter attribute uses file masks to limit the files that are included in the CFZip action. To reach the same outcome as above, we could have unzipped files of type JPG into the root of our destination folder:

  • <!---
  • Unzip the all archived images of type JPG into
  • the directory named "unzipped_images". By not
  • storing the path of the entry, we will be unzipping
  • all files into the root of the destination directory.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_images2/' )#"
  • filter="*.JPG"
  • storepath="false"
  • />

Up till now, we have been unzipping directories of files, but the EntryPath attribute can point to a single file as well. In the following code, we are going to unzip just the mud_monster.jpg image into the destination directory:

  • <!---
  • Unzip the mud_monster.jpg image into the directory
  • named "unzipped_single". By not storing the path
  • of the entry, we will make sure not to create the
  • images subdirectory in our distination folder.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_single/' )#"
  • entrypath="images/mud_monster.jpg"
  • storepath="false"
  • />

Since we don't care about the images folder itself, we just care about the mud_monster.jpg, we are not storing the entry path structure. This will ensure that mud_monster.jpg goes into the root of our destination directory and not into an images folder within the root.

As with all the CFZip actions, unzipping an archive can be done using the CFZip tag in conjunction with one or more nested CFZipParam tags. As a simple example, we can mimic the unzipped images directory by moving the EntryPath, Recurse, and Filter attributes from the CFZip tag down into a CFZipParam tag:

  • <!---
  • Unzip the all archived images of type JPG located
  • in the images folder into the directory named
  • "unzipped_images3". By not storing the path of the
  • entry, we will be unzipping all files into the root
  • of the destination directory.
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped_images3/' )#"
  • storepath="false">
  •  
  • <!--- Unzip the images folder. --->
  • <cfzipparam
  • entrypath="images"
  • recurse="false"
  • filter="*.JPG"
  • />
  •  
  • </cfzip>

Now, when you move attributes down into the CFZipParam tag, it doesn't always have to be one or the other. While the EntryPath and Filter attributes cannot be defined in both the CFZip and CFZipParam tags, the recurse attribute can be defined in the CFZip tag and then overwritten in the CFZipParam tags.

Furthermore, we don't just have to have one CFZipParam tag. We can use multiple CFZipParam tags to define highly dynamic unzipping algorithms. While not complicated in scope, we could mimic our first example (of unzipping the entire zip archive) but using two CFZipParam tags - one for the documents directory and one for the images directory:

  • <!---
  • Unzip the the documents and images archive
  • folder into the directory named "unzipped3".
  • --->
  • <cfzip
  • action="unzip"
  • file="#ExpandPath( './data.zip' )#"
  • destination="#ExpandPath( './unzipped3/' )#"
  • overwrite="true">
  •  
  • <!--- Unzip the documents folder. --->
  • <cfzipparam
  • entrypath="documents"
  • />
  •  
  • <!--- Unzip the images folder. --->
  • <cfzipparam
  • entrypath="images"
  • filter="*.JPG"
  • />
  •  
  • </cfzip>

And, of course, as with the CFZip tag, the EntryPath does not need to point to a directory; it can point to either a directory of a specific file.

ColdFusion 8 is just making all this stuff too easy. I would sum up how cool CFZip / CFZipParam tags are, but come on, it's the end of Part IV - if you don't get it yet, no summary is gonna do anything :)


You Might Also Be Interested In:



Reader Comments

Oct 3, 2008 at 11:07 AM // reply »
2 Comments

Is there a way to unzip a .gz file with the cfzip tag? I've been looking around the web a bit and can't seem to find a solution for it.


Oct 3, 2008 at 11:42 AM // reply »
10,640 Comments

@Brett,

I am not sure on that one.


Oct 3, 2008 at 12:07 PM // reply »
2 Comments

Well. I couldn't fine the answer on that one either. But after some searching I did manage to find a component that I just tested and works great.

I've posted the link to the download and the documentation in the event somebody finds a use for it as I have.

Download: http://download.newsight.de/Zip.zip
Documentation: http://livedocs.newsight.de/Zip/


Feb 10, 2009 at 6:32 AM // reply »
1 Comments

Will this component work on cf mx7?

Thanks


Oct 19, 2009 at 9:33 PM // reply »
2 Comments

Hi Ben,

again - in depth investigation into a CF TAG. Almost better then any CF documentation issued by ADOBE ;-)

josef


Nov 4, 2009 at 1:15 AM // reply »
1 Comments

Hi, I have been using cfzip to unzip images. Then I user cfzip to list the archive so I can resize the images with cfimage.

My problem is when the images have spaces in the file name I get an error, cannot find file.

I understand how to rename a file upon upload to avoid this but in this case, the zip file name(file being uploaded) doesn't matter, it is the files within the zip file I want to rename, is it possible to rename these files as they are unzipped?


Nov 15, 2009 at 11:11 PM // reply »
10,640 Comments

@Bill,

Hmm, I will have to take a look into this. I have not seen this before; but, it is very possible that I have never done any testing with spaces.

You might have to quote the path, but I can't imagine that they would have made that a requirement.


Post A Comment

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.

Please review the following issues:

Author Name:


Author Email:

Author Website:

Comment:

Supported HTML tags for formatting: <strong>bold</strong>   <em>italic</em>   <code>code</code>







  • Help Wanted - Find Your Next ColdFusion Job
InVision App - Prototyping Made Beautiful With Prototyping Tools Ben Nadel's Company - Epicenter Consulting Recent Blog Comments
Feb 10, 2012 at 7:21 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
Update! Instead of $(eval(options.insertAfter)).after(data['insertData']); I now use: var ajaxNode = document.createElement('span'); var parent = $(eval(options.insertAfter))[0].parentNode; ... read »
Feb 10, 2012 at 6:18 PM
jQuery AJAX Strips Script Tags And Inserts Them After Parent-Most Elements
encountered this same, what I consider, jQuery bug last week. I'm building a site in which I load some content via AJAX. This content contains Linkedin share button placeholders which Linkedin API ne ... read »
Feb 10, 2012 at 11:30 AM
Cross-Origin Resource Sharing (CORS) AJAX Requests Between jQuery And Node.js
After you understand the concepts here, this is an awesome cheatsheet for enabling CORS in just about anything http://enable-cors.org/ ... read »
JM
Feb 10, 2012 at 9:10 AM
My Safari Browser SQLite Database Hello World Example
@Amy, Here is a very good tutorial on how to use JOIN: http://www.sqltutorial.org/sqljoin-innerjoin.aspx ... read »
Feb 10, 2012 at 4:42 AM
Building A Twitter-Inspired RESTful API Architecture In ColdFusion
This is great, very useful Ben. I spotted a small typo in the api.cgm listing: <cfthrow type="Unauthroized" /> Cheers Stefan ... read »
Feb 9, 2012 at 10:35 PM
CFDirectory Filtering Uses Pipe Character For Multiple Filters (Thanks Steve Withington)
I was wondering if there would be a filter you could apply so that you got everything but what you included in the filter. As in show me all docs that are not a .pdf. ... read »
Feb 9, 2012 at 10:29 PM
Learning ColdFusion 9: Application-Specific Data Sources
@Ben, No offence, but if people were really wanting advanced features they would be using a platform like ASP.NET MVC. CFML is so structurally compromised as a tag-based scripting language that ... read »
Feb 9, 2012 at 10:03 PM
Subversion - Cleanup Failed To Process The Following Paths
@Leviaguirre, do you still have problems with this? ... read »