Splitting And Joining A Binary File In ColdFusion
I was just reading over on CF-Talk that Shane Trahan was trying to split a binary file, write both parts to disk, and then later read them in and re-join them. I have never done anything like this, so I thought I would give it a try.
At first, I was going to use a few ColdFusion arrays, but that didn't seem to work. For one, ColdFusion arrays are not really arrays, they are Java Collections. Additionally, I think there were some data type conversions taking place that I was not away of. I don't know enough about Java to full understand all the data type stuff.
After a little bit of Googling, I found the Java ByteBuffer. This finally solved the problem! The ByteBuffer does all the heavy lifting for splitting and then joining the underlying byte arrays of the binary file. Check out my solution below:
<!--- Read in the original binary file. ---> <cffile action="readbinary" file="#ExpandPath( './sexy.jpg' )#" variable="binFile" /> <!--- Get the length of the original binary file. ---> <cfset intLength = ArrayLen( binFile ) /> <!--- Get the mid point of the byte array. We are goint to use this as the split point for our two future binary files. ---> <cfset intMid = Ceiling( intLength / 2 ) /> <!--- Now, we are going to use the Java ByteBuffer to do the heavy lifting for us. We can't use ColdFusion arrays directly for manipulation since they are not really arrays, but rather Collections (and there's probably other complications). The Java ByteBuffer will take care of splitting and then later joining our files. Create an instance of the static ByteBuffer class so that we can refer to it multiple times. ---> <cfset objByteBuffer = CreateObject( "java", "java.nio.ByteBuffer" ) /> <!--- Using the ByteBuffer class, create two byte buffer instances for our two file parts. Because we are splitting the original file, we only need to allocate space equal to half of the original file (since we used Ceiling() in our split). ---> <cfset objBufferA = objByteBuffer.Allocate( JavaCast( "int", intMid ) ) /> <cfset objBufferB = objByteBuffer.Allocate( JavaCast( "int", intMid ) ) /> <!--- Now that we have our two ByteBuffer instance, we are going to store half of the original binary byte array into each. ---> <!--- Store first half. ---> <cfset objBufferA.Put( binFile, JavaCast( "int", 0 ), JavaCast( "int", intMid ) ) /> <!--- Store second half. ---> <cfset objBufferB.Put( binFile, JavaCast( "int", intMid ), JavaCast( "int", (intLength - intMid) ) ) /> <!--- Now, all we have to do is write the two byte arrays to disk. In order to get the byte arrays from the ByteBuffer, we just need to call its underlying Array() method. ---> <!--- Write first half. ---> <cffile action="write" file="#ExpandPath( './sexy_a.jpg' )#" output="#objBufferA.Array()#" /> <!--- Write second half. ---> <cffile action="write" file="#ExpandPath( './sexy_b.jpg' )#" output="#objBufferB.Array()#" /> <!--- ASSERT: At this point, we have taken our original binary file and split it up into two parts that have been written back to disk. Now, we can go about testing this by reading them in again, joinging the individual byte arrays, and then streaming to the client. ---> <!--- Read in the first part. ---> <cffile action="readbinary" file="#ExpandPath( './sexy_a.jpg' )#" variable="binFileA" /> <!--- Read in the second part. ---> <cffile action="readbinary" file="#ExpandPath( './sexy_b.jpg' )#" variable="binFileB" /> <!--- Again, we are going to create a Java ByteBuffer to do the heavy lifting for us. This time, we need to allocate space for the resultant byte array which will be equal to the length of both binary files. ---> <cfset arrBinFull = objByteBuffer.Allocate( JavaCast( "int", (ArrayLen( binFileA ) + ArrayLen( binFileB ) ) ) ) /> <!--- Add the entire first file's byte array to the byte buffer using a zero offset and full length. ---> <cfset arrBinFull.Put( binFileA, JavaCast( "int", 0 ), JavaCast( "int", ArrayLen( binFileA ) ) ) /> <!--- Add the entire second file's byte array to the byte buffer using a zero offset and full length. ---> <cfset arrBinFull.Put( binFileB, JavaCast( "int", 0 ), JavaCast( "int", ArrayLen( binFileA ) ) ) /> <!--- At this point, our Java ByteBuffer should now contain the entire byte array the we had in our original file. To prove this, we are going to get the underlying byte array and stream it to the client. ---> <cfcontent type="image/jpeg" variable="#arrBinFull.Array()#" />
Now, I am not sure if this is a good way to do it, but the code seems fairly straight forward.
Want to use code from this post? Check out the license.
I'm assuming the file splitting is for BLOBs?
To be honest, I am not sure what the original intent was. However, I assume that it all shows up as a byte array in one form or another, so I guess whether it's a BLOB or binary file read, the same algorithm (or slightly modified) can be used.
Another excellent example of ColdFusion and Java rocking like it's 1999. Much obliged!
Splitting and joining binary files is quite useful when you're uploading or downloading files. You can also build a file one byte at a time with classes like BufferedInputStream, but that's a whole other story. :)
You used ArrayLen to get the size of the binary file. Any idea on how exactly this works?
I just noticed: your comment for the first CreateObject call says: "Create an instance of the static ByteBuffer class so that we can refer to it multiple times."
I think it's more accurate to say you are loading the ByteBuffer class so that you can call its static methods.
Static methods are the closest that Java comes to global functions. The methods are not called on a particular ByteBuffer object (or instance) - they're just there.
ArrayLen() works because I think the binary file is loaded as a proper Java array of the bytes. I say "proper" array because it's not a ColdFusion array (Collection), it's a real byte array as in Byte ... I think :)
This is all just guess work for me, though. I defer to you for the better Java explanation as your Java experience eclipses mine.
I just took a binary variable created with <cffile action="readbinary"> and looked under the hood with my handy getClassInfo() reflection function. The variable is an instance of this Java class:
WTF? Anyone know what a [B is? Some sort of pointer? The class implements the Serializable and Cloneable interfaces, if that is any help.
I believe the "[" indicates an array and whatever comes after it is the type of array. Like sometimes, I think I get "[String" which is an array of strings.
I just wanted to say thanks for your example. Ended up using a similar method to parse a bytearray from getHttpRequestData().content when a file is uploaded. By default it includes all form elements in a bytearray and we needed to separate them out without converting the data to a string first
Oh cool. I haven't done too much with raw, binary form posts.