I saw a tutorial on your site about splitting and joining files using CF [ColdFusion]. Is there any way to do with without reading in the entire contents of the file part(s) using ReadBinary? On large files this becomes very memory intensive obviously. If you are simply joining parts, why can't you join without having to read and do the checksum at the end to verify?
I know you are asking for a way to split and join files without using ReadBinary at all; and, while I think this is a good question, I wanted to try and address it in a simpler form first. As long as we are splitting files up into smaller parts, why don't we make those parts small enough to be managed by single binary reads. If we have a 2 gig file that we are trying to work with, don't split it up into two 1-gig files - a gig is still quite huge - split it up into many smaller parts. We don't lose anything from having more files. From personal experience, I see that most people split their large files using either 15 meg or 50 meg file sizes.
That being said, I am going to demo the splitting of a huge file using a small buffer such that the entire contents of original file only get read in a bit at a time and written to a smaller part file. Then, to rejoin these smaller files, we are going to assume they are each small enough to be read in via a CFFile [ action = readbinary ]:
<!--- Get the large target binary file that we want to split up in to several parts. Our demo file is only about 5 megabytes, but this should be sufficient to demo. ---> <cfset strTargetFile = ExpandPath( "crazy_insane.jpg" ) /> <!--- Set the name of the re-compiled target file. This will be the path to which we recombine all the individual data chunks. ---> <cfset strNewTargetFile = ExpandPath( "crazy_insane_new.jpg" ) /> <!--- Set the number of bytes that we want to use for our file chunking size. For our demo purposes (since we don't have a huge file), let's use about a megabyte. ---> <cfset intBufferSize = (1024 * 1024) /> <!--- Create a file input stream to read in the chunks of the binary file at a time so that we can split it up. ---> <cfset objInputStream = CreateObject( "java", "java.io.FileInputStream" ).Init( JavaCast( "string", strTargetFile ) ) /> <!--- Create a byte buffer into which we will read the file parts. This byte buffer will determine how large the file chunks are. We are going to use the underlying byte array of a string to create our byte array buffer. Let's make our byte buffer so that it is about a megabyte in size (1024 * 1024 bytes). ---> <cfset arrBuffer = RepeatString( " ", intBufferSize ).GetBytes() /> <!--- Now, we want to keep looping over the input stream and reading files until we no longer can read any more data. We are going to use an index loop with a huge max just to use the counter aspect of it. ---> <cfloop index="intFileIndex" from="1" to="99999" step="1"> <!--- Read from the input stream. ---> <cfset intBytesRead = objInputStream.Read( arrBuffer, JavaCast( "int", 0 ), JavaCast( "int", ArrayLen( arrBuffer ) ) ) /> <!--- Check to see if we read any bytes from the buffer. If so, then we want to write those to file. If not, then we are done reading data. ---> <cfif (intBytesRead GT 0)> <!--- Our buffer contains a certain amount of data. We cannont simply write this buffer to disk in whole because it might not be completely full. Therefore, we cannot use a plain CFFile. Let's create a file output stream so that we can leverage its buffer- using Write() method. When choosing the file name for this file chunk, use the index value of the current read iteration to creat a "part" file. ---> <cfset objOutputStream = CreateObject( "java", "java.io.FileOutputStream" ).Init( JavaCast( "string", "#strTargetFile#.part#intFileIndex#" ) ) /> <!--- Write our buffer to that file outpu stream. ---> <cfset objOutputStream.Write( arrBuffer, JavaCast( "int", 0 ), JavaCast( "int", intBytesRead ) ) /> <!--- Close the file output stream. ---> <cfset objOutputStream.Close() /> <cfelse> <!--- We are done reading data. Close the file input stream to free it up as a system resource. ---> <cfset objInputStream.Close() /> <!--- Break out of read loop. ---> <cfbreak /> </cfif> </cfloop> <!--- END: Split ----------------------------------- ---> <!--- We have now split our large binary file in to several smaller files. Let's see if we can put it back together again in a new binary file. ---> <!--- Again, we don't want to be reading the whole file into memory, so let's create a file output stream to which we can write out smaller file data. ---> <cfset objOutputStream = CreateObject( "java", "java.io.FileOutputStream" ).Init( JavaCast( "string", strNewTargetFile ) ) /> <!--- Now, we want to loop until we no longer can find any smaller chunk files. Sure we could just use the file index found above, but let's do this assuming we don't have any of the data from above. ---> <cfloop index="intFileIndex" from="1" to="99999" step="1"> <!--- Get the file name of the next chunk file. ---> <cfset strFileName = "#strTargetFile#.part#intFileIndex#" /> <!--- Check to see if which file part exists. ---> <cfif FileExists( strFileName )> <!--- Since we knows that these smaller files are not too big, we can simply do a binary read of the complete chunk files into memory. ---> <cffile action="readbinary" file="#strFileName#" variable="binFileData" /> <!--- Write that file data to our output stream. We are going to pretend that the binar data read we just did was actually a byte buffer. ---> <cfset objOutputStream.Write( binFileData, JavaCast( "int", 0 ), JavaCast( "int", ArrayLen( binFileData ) ) ) /> <cfelse> <!--- We have finished reading in smaller chunk files. We can now close the new target file output stream which will finalize this process. ---> <cfset objOutputStream.Close() /> <!--- Break out of this loop. ---> <cfbreak /> </cfif> </cfloop>
As you can see, for the demo, we are reading in the target file one megabyte at a time using a file input stream. That entire megabyte buffer is then written as a part file. Once this process is complete, we then read each part file using CFFile and write it to a file output stream. This way, no more than a megabyte is ever read into memory at any given time. So, while this might produce a lot of files (only 4 in my demo scenario), they are relatively small. Of course, you can increase the buffer size to reduce the number of files, but don't make it too large or you will eat up your memory.
Again, I realize that this doesn't exactly address your issue, but maybe this will help. Let me know if you want to see a demo of a buffered, large-chunk file solution. You can definitely use buffers to create large, 1-gig chunk files, but it is a bit more complicated. But let me know, and I can show you.
Want to use code from this post? Check out the license.