Learning ColdFusion 8: CFZip Part IV - Extracting Zip File Archives

By Ben Nadel

Published 2007-07-24 in ColdFusion — Comments (11)

In the last part of this series, we looked at how to list and read the files of a zip archive using ColdFusion 8's new CFZip and CFZipParam tags. Those techniques alone could be used to fully unzip an archive, but certainly, it wouldn't be the best choice. Luckily, ColdFusion 8's CFZip tag provides us with the Unzip action which can unzip an entire archive, or subsections of an archive, with the greatest of ease.

As always, before we can start exploring the unzipping of an archive in ColdFusion 8, we have to create a zip archive to work with. We are going to zip the following directory structure:

./data/documents/manual.txt
./data/documents/readme.txt
./data/images/funny.jpg
./data/images/mud_monster.jpg
./data/images/red_face.jpg
./data/images/smile.jpg

To create the zip, we are going to use the simple ColdFusion 8 CFZip tag:

<!---
	Create a zip archive of the data directory.
	By default, ColdFusion 8 will recurse the
	source directory, store the storage paths,
	and do to our use of Overwrite attribute, we
	will make sure we create a new zip archive.
--->
<cfzip
	action="zip"
	source="#ExpandPath( './data/' )#"
	file="#ExpandPath( './data.zip' )#"
	overwrite="true"
	/>

This will put all contents of the data directory into the root of our zip archive (the documents and images directory). Now that have a zip, the easiest thing we can do is reverse the process by unzipping the entire archive using the unzip action:

<!---
	Unzip the zip archive into the directory
	named "unzipped". The unzipped directory must
	exists before we perform this action.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped/' )#"
	/>

Here, we are using the two required attributes that go with the Unzip action. The File attribute is the absolute path to the zip archive we are going to unzip. The Destination attribute is the absolute path to the directory into which we are going to unzip the archive contents. This directory must exist before you try to reference it; if you try to unzip the archive into a non-existent directory, ColdFusion 8 will throw the following error:

The destination G:\....\cf8\zip\unzipped specified in the cfzip tag is invalid. The destination must be a directory and should be accessible by this tag.

By default, ColdFusion 8 will unzip the entire archive, keeping the archive directory structure (entry path structure) as is, and will not overwrite files that already exist in the destination directory. But, by using some of the optional attributes of the CFZip tag, we can change the way things happen.

The StorePath attribute, which defaults to True, is what determines whether or not we keep the entry path structure. If directory structure is not important to us and we want to unzip all the entries directly into the root of the destination folder, all we need to do is set StorePath to false. Running this code:

<!---
	Unzip the zip archive into the directory
	named "unzipped_flat". Instead of keeping the
	archive entry path structure, we are going to
	unzip all of the entries directly into the
	root of our target directory.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_flat/' )#"
	storepath="false"
	/>

... will leave us with an unzipped_flat directory that looks like this:

./funny.jpg
./manual.txt
./mud_monster.jpg
./readme.txt
./red_face.jpg
./smile.jpg

When we flatten a zip archive, as we just did above, one of the things we have to be careful of is possible naming conflicts that might be caused by like-named files at different entry paths being merged into the destination root. For example, if we added this text file entry:

./images/readme.txt

... with the content:

This is the IMAGES readme file.

... to the data archive, when flattened, the readme.txt file in the documents folder would be conflict with the readme.txt we just added to the images folder. By default, ColdFusion 8 will not overwrite any existing files in the destination directory, and as such, the documents readme.txt file will be the only readme.txt file that gets unzipped. Since unzipping happens in a depth-first fashion, the images readme.txt file will be examined only after the documents readme.txt file, and since there is already a readme.txt in the root of the destination, it will not be unzipped. ColdFusion 8 will not throw an error over this, it will simply skip the current archive entry.

By setting the Overwrite attribute to True, we can get ColdFusion to overwrite any files that already exist. Therefore, running this code:

<!---
	Unzip the zip archive into the directory
	named "unzipped_flat". Instead of keeping the
	archive entry path structure, we are going to
	unzip all of the entries directly into the
	root of our target directory. As we merge the
	entries, we are going to overwrite them, thereby
	keeping only the last version of all the
	like-named files we come accross.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_flat/' )#"
	storepath="false"
	overwrite="true"
	/>

... the images readme.txt will still be unzipped after the documents readme.txt, but this time, the images readme.txt file will overwrite the one from the documents entry path.

Using the Recurse attribute, which defaults to True, we can get CFZip to only extract a single directory. Running this code:

<!---
	Unzip the zip archive into the directory
	named "unzipped_root". Instead of recursing
	through the entire archive, just unzip the
	root directory.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_root/' )#"
	recurse="false"
	/>

... will unzip the only the root of the zip archive into the directory, unzipped_root. However, since there are no files in the root of our archive (only our two directories - documents and images), our unzipped_root directory remains empty.

If we don't want to recurse, but we also don't want to unzip the root of the archive, we can use the optional EntryPath attribute to get at a subdirectory of the archive. If we wanted to unzip just the images folder, we run this code:

<!---
	Unzip the archived images folder into the
	directory named "unzipped_images". By not storing
	the path of the entry, we will ensure that an
	"images" folder does not get created.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_images/' )#"
	entrypath="images"
	recurse="false"
	storepath="false"
	/>

This would leave us with an unzipped_images directory that looks like this (we are no longer dealing with the readme.txt in the images directory):

./funny.jpg
./mud_monster.jpg
./red_face.jpg
./smile.jpg

Now, there's actually a bunch of things happening synergistically in the code that we just ran. We turned off directory recursion so that if images had a subdirectory, it would be ignored. We then told CFZip not to store the entry paths. This is make sure we don't end up with an "images" directory inside of our unzipped_images destination directory. By default, since the images are inside of an images archive folder, CFZip wants to create that images folder into which it will unzip the image entries. Then, finally, to make sure we are just unzipping the images archive folder, we use the optional EntryPath attribute to point the action at the images folder.

The EntryPath attribute acts a bit different than it did when we examined it in the context of Reading archive entries. When reading an archive entry , you cannot use the "./" and "/" leading path constructs or ColdFusion 8 will throw an error. When it comes to unzipping an archive, you still cannot use the "./" or "/" leading path constructs. Additionally, you cannot even use the trailing "/" characters. The following EntryPath values are all invalid:

/images/
./images/
images/

The difference, when unzipping an archive, is that ColdFusion 8 will not throw any errors. The above paths will simply not work. In order to properly define a target directory, you must exclude both leading and trailing path slash constructs.

Now, we could have accomplished the same thing by using the optional Filter attribute. As we have covered in almost every other part of this series, the filter attribute uses file masks to limit the files that are included in the CFZip action. To reach the same outcome as above, we could have unzipped files of type JPG into the root of our destination folder:

<!---
	Unzip the all archived images of type JPG into
	the directory named "unzipped_images". By not
	storing the path of the entry, we will be unzipping
	all files into the root of the destination directory.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_images2/' )#"
	filter="*.JPG"
	storepath="false"
	/>

Up till now, we have been unzipping directories of files, but the EntryPath attribute can point to a single file as well. In the following code, we are going to unzip just the mud_monster.jpg image into the destination directory:

<!---
	Unzip the mud_monster.jpg image into the directory
	named "unzipped_single". By not storing the path
	of the entry, we will make sure not to create the
	images subdirectory in our distination folder.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_single/' )#"
	entrypath="images/mud_monster.jpg"
	storepath="false"
	/>

Since we don't care about the images folder itself, we just care about the mud_monster.jpg, we are not storing the entry path structure. This will ensure that mud_monster.jpg goes into the root of our destination directory and not into an images folder within the root.

As with all the CFZip actions, unzipping an archive can be done using the CFZip tag in conjunction with one or more nested CFZipParam tags. As a simple example, we can mimic the unzipped images directory by moving the EntryPath, Recurse, and Filter attributes from the CFZip tag down into a CFZipParam tag:

<!---
	Unzip the all archived images of type JPG located
	in the images folder into the directory named
	"unzipped_images3". By not storing the path of the
	entry, we will be unzipping all files into the root
	of the destination directory.
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped_images3/' )#"
	storepath="false">

	<!--- Unzip the images folder. --->
	<cfzipparam
		entrypath="images"
		recurse="false"
		filter="*.JPG"
		/>

</cfzip>

Now, when you move attributes down into the CFZipParam tag, it doesn't always have to be one or the other. While the EntryPath and Filter attributes cannot be defined in both the CFZip and CFZipParam tags, the recurse attribute can be defined in the CFZip tag and then overwritten in the CFZipParam tags.

Furthermore, we don't just have to have one CFZipParam tag. We can use multiple CFZipParam tags to define highly dynamic unzipping algorithms. While not complicated in scope, we could mimic our first example (of unzipping the entire zip archive) but using two CFZipParam tags - one for the documents directory and one for the images directory:

<!---
	Unzip the the documents and images archive
	folder into the directory named "unzipped3".
--->
<cfzip
	action="unzip"
	file="#ExpandPath( './data.zip' )#"
	destination="#ExpandPath( './unzipped3/' )#"
	overwrite="true">

	<!--- Unzip the documents folder. --->
	<cfzipparam
		entrypath="documents"
		/>

	<!--- Unzip the images folder. --->
	<cfzipparam
		entrypath="images"
		filter="*.JPG"
		/>

</cfzip>

And, of course, as with the CFZip tag, the EntryPath does not need to point to a directory; it can point to either a directory of a specific file.

ColdFusion 8 is just making all this stuff too easy. I would sum up how cool CFZip / CFZipParam tags are, but come on, it's the end of Part IV - if you don't get it yet, no summary is gonna do anything :)

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/855

Reader Comments

Brett Oct 3, 2008 at 11:07 AM

2 Comments

Is there a way to unzip a .gz file with the cfzip tag? I've been looking around the web a bit and can't seem to find a solution for it.

Ben Nadel Oct 3, 2008 at 11:42 AM

16,205 Comments

@Brett,

I am not sure on that one.

Brett Oct 3, 2008 at 12:07 PM

2 Comments

Well. I couldn't fine the answer on that one either. But after some searching I did manage to find a component that I just tested and works great.

I've posted the link to the download and the documentation in the event somebody finds a use for it as I have.

Download: http://download.newsight.de/Zip.zip
Documentation: http://livedocs.newsight.de/Zip/

Suraj Feb 10, 2009 at 6:32 AM

1 Comments

Will this component work on cf mx7?

Thanks

josef pichler Oct 19, 2009 at 9:33 PM

2 Comments

Hi Ben,

again - in depth investigation into a CF TAG. Almost better then any CF documentation issued by ADOBE ;-)

josef

Bill Munsell Nov 4, 2009 at 1:15 AM

1 Comments

Hi, I have been using cfzip to unzip images. Then I user cfzip to list the archive so I can resize the images with cfimage.

My problem is when the images have spaces in the file name I get an error, cannot find file.

I understand how to rename a file upon upload to avoid this but in this case, the zip file name(file being uploaded) doesn't matter, it is the files within the zip file I want to rename, is it possible to rename these files as they are unzipped?

Ben Nadel Nov 15, 2009 at 11:11 PM

16,205 Comments

@Bill,

Hmm, I will have to take a look into this. I have not seen this before; but, it is very possible that I have never done any testing with spaces.

You might have to quote the path, but I can't imagine that they would have made that a requirement.

David Price Apr 15, 2013 at 10:14 AM

3 Comments

Ben,
I have two sites that I am trying to sync up the code base using a utility that I have written. The problem is that I have to use ftp to move the files. When I use ftp the name is modified on the target site. SO! I zipped the file first and ftp it to a holding directory where I then unzip it to the correct directory. In the zipped file the modified date is correct and in the zipped file at the other end. I can open the file and upzip it with Windows and it retains its modified date.

The Problem is that <cfzip action=unzip changes the modified date to the current date.

Is this the normal action of an upzip? Is there an option to override this?

Thanks!
David

Kevin Apr 17, 2013 at 3:25 PM

4 Comments

Hello,

Is there a reason that you wrote a bunch of code to do the sync instead of using an application that is designed to do exactly what you are looking for?

I think that there is a 'freeware' version of SynBack for instance that gets pretty good reviews. The interface is a bit to get used to, but it does the job.

There are a number of such applications.

I tried/tested a lot of them and found that some were able to move and sync files considerably faster than others.

Unfortunately, I don't have the list to share at this time.

However, as much as I love ColdFusion, why reinvent the wheel?

Best regards,
Kevin Randolph

David Price Apr 17, 2013 at 3:34 PM

3 Comments

Kevin,

Thanks for the suggestion. I just resolved the issue. I built a custom one as we have 2 development environments, plus an Integration and Pre-Production environment on one server and Production on another server. We wanted a view of all of the files on any of the envornments and the ablility to move the file with a click.
I will check out the SynBack application.
I appreciate your input!

Regards,
David

Ben Nadel Apr 17, 2013 at 3:35 PM

16,205 Comments

@David, @Kevin,

To be honest, I haven't used CFZip in a while and I'm not sure what the normal behavior of the date would be.

That said, and to @Kevin's point, one of my favorite applications of ALL TIME is "Beyond Compare". Unfortunately, they don't have it for Mac; but, on Windows, it's the cat's pajamas. And, I think it supports FTP syncing, which is cool if you like to manually curate the file sync.

Sorry my answer doesn't speak to your problem more directly.

Oh my chickens, this post is old!

Hit me up on LinkedIn if you want to discuss it further.