Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: David Epler
Ben Nadel at cf.Objective() 2009 (Minneapolis, MN) with: David Epler

Consuming The Crypto Hash Algorithms As A Stream In Node.js

By Ben Nadel on

The other day, when I was building a static file server in Node.js, I noticed that the Crypto module treats the Hash object as a Transform stream that is both writable and readable. And, that it explicitly has "legacy" support for the old .update() and .digest() methods. In the world of programming, "legacy" is a very loaded word, often associated with negative feelings about the state of software. When you consider the fact that the Crypto module is marked "unstable," one can't help but wonder if "legacy" will soon give way to "deprecated". As such, I wanted to see what a Crypto Hash object would look like when consumed as a Transform stream.


 
 
 

 
 
 
 
 

I came across this Crypto verbiage when I was trying to create a writable ETagStream that would emit an "etag" event when the underlying MD5 hash of the stream content was available. As such, I'll explore this Hash behavior in the same context - piping a Readable file stream into the Writable ETagStream.

First, let's look at how the ETagStream consumes the underlying Crypto Hash when using the so-called "legacy" methods:

  • // Require the core node modules.
  • var stream = require( "stream" );
  • var util = require( "util" );
  • var crypto = require( "crypto" );
  • var fileSystem = require( "fs" );
  •  
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  •  
  • // CAUTION: Run the code in the next tick to give the full prototype chain a chance to
  • // initialize. If we try to run immediately, we'll get the function hoisting for the
  • // ETagStream constructor, but the prototype chain will not yet be defined fully.
  • process.nextTick(
  • function run() {
  •  
  • var fileReadStream = fileSystem.createReadStream( "./gina-carano.jpg" );
  •  
  • // Once the file is finished piping into the etagStream, it will emit an etag
  • // event with the computed MD5 hash.
  • var etagStream = new ETagStream()
  • .on(
  • "etag",
  • function handleETag( etag ) {
  •  
  • console.log( "ETag:", etag );
  •  
  • }
  • )
  • ;
  •  
  • fileReadStream.pipe( etagStream );
  •  
  • }
  • );
  •  
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  •  
  • // I provide a writable stream that will emit an "etag" event once the stream is closed.
  • // The etag will be an MD5 hash of the content that was written to the stream.
  • // --
  • // NOTE: In this version, we'll be using the "legacy" methods of the underlying Hash
  • // object which allow for intuitive .update() and .digest() methods.
  • function ETagStream() {
  •  
  • // Call the super constructor.
  • stream.Writable.call( this );
  •  
  • this._hasher = crypto.createHash( "md5" );
  •  
  • // Listen for the "finish" event, which will indicate that we have all the data that
  • // we need in order to generate the MD5 has of the stream content.
  • this.once( "finish", this._handleFinish.bind( this ) );
  •  
  • }
  •  
  • util.inherits( ETagStream, stream.Writable );
  •  
  •  
  • // ---
  • // PRIVATE METHODS.
  • // ---
  •  
  •  
  • // I handle the finish event, which, in turn, emits an "etag" event.
  • ETagStream.prototype._handleFinish = function() {
  •  
  • // When dealing with "legacy" crypto methods, all we have to do is digest all of
  • // the data that has been aggregated in the _write() method.
  • this.emit( "etag", this._hasher.digest( "hex" ) );
  •  
  • };
  •  
  •  
  • // I write data to the etag stream.
  • ETagStream.prototype._write = function( chunk, encoding, writeComplete ) {
  •  
  • // When dealing with "legacy" crypto methods, we can simply pass the chunk into
  • // the underlying hash without giving any concern to back-pressure.
  • this._hasher.update( chunk, encoding );
  •  
  • writeComplete();
  •  
  • };

As you can see, the ETagStream class is quite simple - when data is written to the stream, it calls the .update() method; and, when the stream is closed, it computes the MD5 digest using the .digest() methods.

Now, let's take a look at what the ETagStream needs to do when treating the Hash instance as a generic Transform stream:

  • // Require the core node modules.
  • var stream = require( "stream" );
  • var util = require( "util" );
  • var crypto = require( "crypto" );
  • var fileSystem = require( "fs" );
  • var buffer = require( "buffer" ).Buffer;
  •  
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  •  
  • // CAUTION: Run the code in the next tick to give the full prototype chain a chance to
  • // initialize. If we try to run immediately, we'll get the function hoisting for the
  • // ETagStream constructor, but the prototype chain will not yet be defined fully.
  • process.nextTick(
  • function run() {
  •  
  • var fileReadStream = fileSystem.createReadStream( "./gina-carano.jpg" );
  •  
  • // Once the file is finished piping into the etagStream, it will emit an etag
  • // event with the computed MD5 hash.
  • var etagStream = new ETagStream()
  • .on(
  • "etag",
  • function handleETag( etag ) {
  •  
  • console.log( "ETag:", etag );
  •  
  • }
  • )
  • ;
  •  
  • fileReadStream.pipe( etagStream );
  •  
  • }
  • );
  •  
  •  
  • // ----------------------------------------------------------------------------------- //
  • // ----------------------------------------------------------------------------------- //
  •  
  •  
  • // I provide a writable stream that will emit an "etag" event once the stream is closed.
  • // The etag will be an MD5 hash of the content that was written to the stream.
  • // --
  • // NOTE: In this version, we'll be treating the underlying Hash as a Duplex stream that
  • // is both writable and readable. Once we make this leap, we have to assume that the Hash
  • // stream exhibits all of the data-oriented events and buffering concerns of and writable
  • // and readable stream.
  • function ETagStream() {
  •  
  • // Call the super constructor.
  • stream.Writable.call( this );
  •  
  • this._hasher = crypto.createHash( "md5" );
  •  
  • // Listen for the "finish" event, which will indicate that we have all the data that
  • // we need in order to generate the MD5 has of the stream content.
  • this.once( "finish", this._handleFinish.bind( this ) );
  •  
  • }
  •  
  • util.inherits( ETagStream, stream.Writable );
  •  
  •  
  • // ---
  • // PRIVATE METHODS.
  • // ---
  •  
  •  
  • // I handle the finish event, which, in turn, emits an "etag" event.
  • ETagStream.prototype._handleFinish = function() {
  •  
  • // Create a closed-over reference to "this".
  • var etagStream = this;
  •  
  • // I hold the chunks of data that can be read out of the hash stream. Once the stream
  • // has been fully consumed, we can concatenate this buffer to get the MD5 digest.
  • var chunks = [];
  •  
  • // Now that we are treating the hash as a generic stream, we have to explicitly end
  • // the stream and listen for data events. We can't assume that the data will be
  • // available immediately, or event in one pass. As such, we have to listen for the
  • // "readable" and "end" events so that we know when the "etag" event can be emitted.
  • this._hasher
  • .on(
  • "readable",
  • function handleReadableEvent() {
  •  
  • var chunk = null;
  •  
  • // Keep reading data until the read() returns null. This will indicate
  • // that we have fully consumed the internal buffers and we'll need to
  • // wait for another "readable" event before reading more.
  • while ( ( chunk = this.read() ) !== null ) {
  •  
  • chunks.push( chunk );
  •  
  • }
  •  
  • }
  • )
  • .on(
  • "end",
  • function handleEndEvent() {
  •  
  • // Now that we have extracted all of the chunks that represent the MD5
  • // hash, we can flatten them down into a single buffer and export them
  • // as a hex-encoded string.
  • etagStream.emit( "etag", buffer.concat( chunks ).toString( "hex" ) );
  •  
  • }
  • )
  •  
  • // Close the writable hash stream so that it can calculate the digest internally.
  • .end()
  • ;
  •  
  • };
  •  
  •  
  • // I write data to the etag stream.
  • ETagStream.prototype._write = function( chunk, encoding, writeComplete ) {
  •  
  • // Now that we are treating the hash as a generic stream, we have to worry about
  • // back-pressure. If we write to the hash stream and it returns false, this is an
  • // advisory response that tells us we need to stop writing until we have a
  • // subsequent drain event (which indicates that the internal buffers of the hash
  • // stream have been flushed and are ready to receive more data).
  • if ( this._hasher.write( chunk, encoding ) === false ) {
  •  
  • this._hasher.once( "drain", writeComplete );
  •  
  • } else {
  •  
  • writeComplete();
  •  
  • }
  •  
  • };

As you can see, things get a lot more complex. Once we take the leap-of-faith that the Hash object is a Transform stream, we have to abandon all other assumptions about the way hashing works, especially its synchronous nature. This means that we can't assume that there won't be any back-pressure on the hash stream. And, we can't assume that all of the data will be read out of the Transform stream in a single .read() operation (or even in a single loop). And, we can't assume that the data will be available immediately. Instead, we have to pay attention to return values and listen for emitted events.

It's possible that I am reading too much into the term, "legacy." But given the state of the Crypto module (Unstable - 2), I think it's reasonable to assume that legacy may one day lead to "deprecated" which may one day lead to "breaking change." As such, I think it's good to see how the Crypto Hash object can be consumed as a stream. Unfortunately, as you can see from the code above, consuming the Hash object as a stream is quite a bit more complex than the legacy approach.




Reader Comments

I almost didn't even click on this one. As a programmer and do-everything IT person, I am sometime tasked with reverse engineering viruses that come across our network. As such, one of them is called a "crypto" (I'm not even sure if it is spelled the same, but maybe. I'm not 100% on the spelling of it, I just know I interpreted the language it was developed in and identified what it was and what it was doing), which has pretty severe and far-reaching consequences if it gets out on a network. It's probably not even the same thing at all, but the title got me a little jumpy. :)

Reply to this Comment

@Anna,

Ha ha, sounds like you do some intense stuff. I wouldn't even know to begin with something like that.

Reply to this Comment

@Ben,

Thanks! It is extremely interesting and I enjoy the challenges of working in a diverse division like that. I love programming and still do plenty of just pure programming, but it's nice having a variety of different things I do at work also. I enjoy it both ways. :-)

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments
Live in the Now
Oops!
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.