Skip to main content
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Blain Smith
Ben Nadel at the jQuery Conference 2010 (Boston, MA) with: Blain Smith

Retrying Bulk Updates In PouchDB Using A Recursive Promise Chain

By
Published in

Over the weekend, I posted a PouchDB plugin that performs a bulk update operation in PouchDB. This plugin was meant to encapsulate the Get-Modify-Put workflow for updating multiple documents in this NoSQL database. The plugin didn't inspect the results in any way - it just passed them back to the calling context (via a Promise). After reading through Nolan Lawson's upsert plugin, however, I was inspired to try and add some retry logic to my bulk-update plugin. This turned out to require quite a bit more noodling than I had anticipated.

Run this demo in my JavaScript Demos project on GitHub.

In theory, retrying a .bulkDocs() operation in PouchDB should be relatively straightforward:

  • Execute the .bulkDocs() request.
  • Extract 409-conflict documents.
  • Retry 409-conflict documents.
  • Merge retry results back into original results.
  • Repeat, if necessary.

Of course, dealing with asynchronous code - especially asynchronous code that contains loops - is not exactly simple. But, the real complexity of the retry wasn't the asynchronous Promise-based nature, it was the fact that a Conflict result doesn't tell you which document it was for. For example, here's what a .bulkDocs() operation might return when conflicts are present:

[
	{
		"ok":true,
		"id":"apple:applecrisp",
		"rev":"2-3c1b7e99d662f4b770e90ea5ce654d0d"
	},
	{
		"status":409,
		"name":"conflict",
		"message":"Document update conflict",
		"error":true
	},
	{
		"status":409,
		"name":"conflict",
		"message":"Document update conflict",
		"error":true
	}
]

Notice that while a successful operation reports the doc "id", the conflicts do not. As such, it's not possible to perform a retry based on the results alone; you'd have no idea which documents to reprocess. Solving this problem, in the context of an iterative retry, took me several days. In fact, I started working on this right after I finished my last post (3 days ago), and only just finished it last night.

My solution was to keep track of both the "primary documents" and the "primary results". Since we know that .bulkDocs() method returns results in the same order in which it received documents, we know that we can match conflict results to their source documents based on array indices:

PouchDB returns results in the same order as inputs.

Now, this parallel ordering of the documents and results becomes important for both the execution of the retry operation and the processing of the results of the retry operation. Since we're only retrying a subset of the original documents, the length of the retry results won't be the same as the length of the original documents. But, the order of the documents is maintained. As such, we can associate subsequent retry results with the original documents based on relative order, not on index:

PouchDB result ordering relative to input documents in bulkDocs().

Once we have this mental model established, we can start to perform retry requests on bulk operations. However, since we're relying on an out-of-context operator (callback) to perform the document mutation, there's a small possibility that conflicts will never be resolved. For example, if the operator continually deletes the "_rev" property from each document, no amount of retry will help. As such, we want to put a limit on the number of retries we perform.

To do this, I implemented a recursive Promise chain that call itself with an ever-decreasing number of possible retries:

.then(
	function recursivelyProcessResults( remainingRetries ) {

		// ...

		var retryPromise = retryConflicts().then(
			function() {

				// ...

				return( recursivelyProcessResults( remainingRetries - 1 ) );

			}
		)

		return( retryPromise );

	}
)

Using this approach, we essentially create a segment in the Promise chain that iteratively binds its own results back to itself. Recursion is fun! In order to understand recursion, you must first understand recursion.

The key to recursion is that there must be an end to the iteration - a point at which a known result is returned instead of a call back to the same function. In this case, our recursion ends when we either run out of retry-attempts; or, if the results come back without any conflicts:

.then(
	function recursivelyProcessResults( remainingRetries ) {

		if ( isConflictFree( primaryResults ) || ! remainingRetries ) {

			return( primaryResults );

		}

		// ...

		var retryPromise = retryConflicts().then(
			function() {

				// ...

				return( recursivelyProcessResults( remainingRetries - 1 ) );

			}
		)

		return( retryPromise );

	}
)

Here, you can see that the top of our recursive Promise segment will halt the recursion if the results are conflict-free or if we've iterated too many times. In that case, it resolves the current Promise segment with the current state of the primary results, which will then be passed on to the next segment of the Promise chain (which is outside of the scope of the plugin).

NOTE: This works because a named-function always has access to its own reference, even if that reference is not stored in a lexically-bound variable (so to speak). This is a fundamental feature of JavaScript and is not specific to Promises or PouchDB.

To bring this all together, let's revisit the fruit-demo in which we update each Apple document. Only this time, in order to create and resolve artificial conflicts, my update operator() is going to delete the "_rev" property from each document after the first index. This means that if N documents are being updated, indices 1..(N-1) will return as Conflict results. Essentially, on each .bulkDocs() call, only the first document will be processed successfully - the rest will have to be retried. Of course, on each subsequent retry, the "next first document" will succeed.

<!doctype html>
<html>
<head>
	<meta charset="utf-8" />

	<title>
		Retrying Bulk Updates In PouchDB Using A Recursive Promise Chain
	</title>
</head>
<body>

	<h1>
	<h1>
		Retrying Bulk Updates In PouchDB Using A Recursive Promise Chain
	</h1>

	<p>
		<em>Look at console &mdash; things being logged, yo!</em>
	</p>

	<script type="text/javascript" src="../../vendor/pouchdb/6.0.7/pouchdb-6.0.7.min.js"></script>
	<script type="text/javascript">

		// I provide an API for updating many documents (encapsulating the fetch and
		// subsequent .bulkDocs() call). This method will use either the .allDocs() method
		// or the .query() method for fetching, depending on the invocation signature:
		// --
		// .updateMany( options, operator ) ==> Uses .allDocs()
		// .updateMany( viewName, options, operator ) ==> Uses .query()
		// --
		// In each case, the "options" object is passed to the underlying fetch method.
		// Each document in the resultant collection is then passed to the given operator
		// function - operator( doc ) - to perform the update transformation.
		PouchDB.plugin({
			updateMany: function( /* [ viewName, ] options, operator */ ) {

				var pouch = this;

				// CAUTION: Top-level errors MAY NOT be caught in a Promise.

				// .allDocs() invocation signature: ( options, operator ).
				if ( arguments.length === 2 ) {

					var options = arguments[ 0 ];
					var operator = arguments[ 1 ];
					var promise = pouch.allDocs( ensureIncludeDocs( options ) );

				// .query() invocation signature: ( viewName, options, operator ).
				} else {

					var viewName = arguments[ 0 ];
					var options = arguments[ 1 ];
					var operator = arguments[ 2 ];
					var promise = pouch.query( viewName, ensureIncludeDocs( options ) );

				}

				// This plugin will retry [portions of] the bulk update operation if some
				// of the results come back as 409-Conflicts. Since conflict documents do
				// not report which document ID they are referring to, we have to keep
				// track of the primary documents and results objects so that we can
				// merge subsequent results back into the primary results using the order
				// in which various things are returned.
				var primaryDocs = null;
				var primaryResults = null;

				// Even though the results are potentially coming back from two different
				// search methods - .allDocs() or .query() - the result structure from
				// both methods is the same. As such, we can count on the following keys
				// to exist in the results:
				// --
				// * offset
				// * total_rows
				// * rows : [{ doc }]
				// --
				promise = promise
					.then(
						function( results ) {

							// Pass the source docs through the operator.
							var docsToUpdate = results.rows.map(
								function iterator( row, index, rows ) {

									return( operator( row.doc, index, rows ) || row.doc );

								}
							);

							// Keep track of the primary documents so that we can match
							// results to document IDs using the index order.
							primaryDocs = docsToUpdate;

							return( pouch.bulkDocs( docsToUpdate ) );

						}
					)
					.then(
						function( results ) {

							// Keep track of the primary results so that we can check for
							// conflicts and merge subsequent results back into the fold.
							// At this point, ( primaryDocs[ i ] ==> primaryResults[ i ] ).
							primaryResults = results;

							return( 3 /* Potential retries */ );

						}
					)
					// CAUTION: This portion of the Promise chain is the "retry" portion
					// and will be called recursively. This method assumes access to the
					// primary Docs and Results collection, and will recursively extract
					// conflicts from the primary results, reprocess those documents, and
					// merge them back into the primary results.
					.then(
						function recursivelyProcessResults( remainingRetries ) {

							// Return the results if we're done processing (either because
							// there are no conflicts or we ran out of retry attempts).
							if ( isConflictFree( primaryResults ) || ! remainingRetries ) {

								return( primaryResults );

							}

							console.warn( "Retrying primary results (%s).", remainingRetries );

							// If we made it this far, we have Conflict results to try
							// and reprocess.
							var retryPromise = pouch
								// Regardless of how we fetched the documents originally,
								// we now have document _id values to work with. Which
								// means, all retries can be done using .allDocs(keys).
								// --
								// NOTE: We need to re-fetch in order to get the source
								// document's most current revision. If we don't re-fetch,
								// we'll just keep getting conflicts over and over again.
								.allDocs({
									keys: getConflictKeys( primaryDocs, primaryResults ),
									include_docs: true
								})
								.then(
									function( results ) {

										// Pass the RETRY docs through the operator.
										var docsToUpdate = results.rows.map(
											function iterator( row, index, rows ) {

												return( operator( row.doc, index, rows ) || row.doc );

											}
										);

										return( pouch.bulkDocs( docsToUpdate ) );

									}
								)
								.then(
									function( results ) {

										// Merge the retry results back into primary
										// results collection. This way, any successful
										// operations in the retry will now replace the
										// Conflict operations in the primary results.
										// Since we know that the retry docs and results
										// were processed in the same order as the
										// Conflicts in the primary results, we should be
										// able to zipper the results together, in order.
										for ( var i = 0 ; i < primaryResults.length ; i++ ) {

											// If we've found a conflict result in the
											// primary results, swap it with the NEXT
											// AVAILABLE retry result.
											if ( isConflictResult( primaryResults[ i ] ) ) {

												primaryResults[ i ] = results.shift();

											}

										}

										// At this point, we may have resolved all of
										// the conflicts; or, we may still have
										// conflicts after the merge. Call retry function
										// recursively (with one less possible retry) so
										// that we can reexamine the results.
										return( recursivelyProcessResults( remainingRetries - 1 ) );

									}
								)
								// If anything in the retry-operation fails catastrophically,
								// just bail out and return the primary results in their last
								// know state.
								.catch(
									function( error ) {

										console.warn( "Aborting out of retry to due to critical failure." );
										console.log( error );
										return( primaryResults );

									}
								)
							;

							return( retryPromise );

						}
					)
				;

				return( promise );


				// -- Utility methods for my PouchDB plugin. Thar be hoistin'! -- //


				// I determine if any of the results contain a conflict.
				function containsConflicts( results ) {

					for ( var i = 0 ; i < results.length ; i++ ) {

						if ( isConflictResult( results[ i ] ) ) {

							return( true );

						}

					}

					return( false );

				}


				// I ensure that the given search options has the "include_docs" set to
				// true. Since we are working on updating documents, it is important
				// that we actually fetch the docs being updated. Returns options.
				function ensureIncludeDocs( options ) {

					options.include_docs = true;

					return( options );

				}


				// I return an array of doc._id values for any doc that has a conflict
				// in the given results. This assumes that the given results were a
				// product of a bulk operation on the given docs.
				function getConflictKeys( docs, results ) {

					var keys = [];

					for ( var i = 0 ; i < results.length ; i++ ) {

						if ( isConflictResult( results[ i ] ) ) {

							keys.push( docs[ i ]._id );

						}

					}

					return( keys );

				}


				// I determine if the given results are free from any conflicts.
				function isConflictFree( results ) {

					return( ! containsConflicts( results ) );

				}


				// I determine if the given result object is a conflict result.
				function isConflictResult( result ) {

					return( ( result.error === true ) && ( result.status === 409 ) );

				}

			}
		});


		// --------------------------------------------------------------------------- //
		// --------------------------------------------------------------------------- //


		getPristineDatabase( window, "db" ).then(
			function() {

				// To experiment with the bulk update PLUGIN, we need to have documents
				// on which to experiment. Let's create some food products with names
				// and prices that we'll update with the bulk update plugin.
				var promise = db.bulkDocs([
					{
						_id: "apple:fuji",
						name: "Fuji",
						price: 1.05
					},
					{
						_id: "apple:applecrisp",
						name: "Apple Crisp",
						price: 1.33
					},
					{
						_id: "pear:bosc",
						name: "Bosc",
						price: 1.95
					},
					{
						_id: "apple:goldendelicious",
						name: "Golden Delicious",
						price: 1.27
					},
					{
						_id: "pear:bartlett",
						name: "Bartlett",
						price: 1.02
					}
				]);

				return( promise );

			}
		)
		.then(
			function() {

				// Now that we've inserted the documents, let's fetch all the Apples
				// and output them so we can see the pre-update values.
				var promise = db
					.allDocs({
						startkey: "apple:",
						endkey: "apple:\uffff",
						include_docs: true
					})
					.then( renderResultsToConsole )
				;

				return( promise );

			}
		)
		.then(
			function() {

				// Now, let's update the Apples, converting the name to uppercase. This
				// will run an .allDocs() and a .bulkDocs() under the hood.
				var promise = db.updateMany(
					{
						startkey: "apple:",
						endkey: "apple:\uffff"
					},
					function operator( doc, i ) {

						doc.name += " ( UPDATED )";

						// ... HOWEVER ... since we want to demonstrate the retry ability
						// of the plugin, we are going to delete the _rev key for each
						// document AFTER THE FIRST INDEX. This way, each retry will only
						// be able to process the first document - the rest will result
						// in conflicts. Given 3 apples, we would expect the following
						// retry results:
						// --
						// Pass 1: [ ok, conflict, conflict ]
						// Pass 2: [ ok, conflict ]
						// Pass 3: [ ok ]
						// --
						// ==> results: [ ok, ok, ok ]
						if ( i > 0 ) delete( doc._rev );

					}
				);

				return( promise );

			}
		)
		.then(
			function() {

				// Now that we've updated the Apples (with retries under the hood) let's
				// re-fetch the Apples to see how the values have changed.
				var promise = db
					.allDocs({
						startkey: "apple:",
						endkey: "apple:\uffff",
						include_docs: true
					})
					.then( renderResultsToConsole )
				;

				return( promise );

			}
		)
		.catch(
			function( error ) {

				console.warn( "An error occurred:" );
				console.error( error );

			}
		);


		// --------------------------------------------------------------------------- //
		// --------------------------------------------------------------------------- //


		// I ensure that a new database is created and stored in the given scope.
		function getPristineDatabase( scope, handle ) {

			var dbName = "javascript-demos-pouchdb-playground";

			var promise = new PouchDB( dbName )
				.destroy()
				.then(
					function() {

						// Store new, pristine database in to the given scope.
						return( scope[ handle ] = new PouchDB( dbName ) );

					}
				)
			;

			return( promise );

		}


		// I use the console.table() method to log the documents in the given results
		// collection to the console.
		function renderResultsToConsole( results ) {

			var docs = results.rows.map(
				function( row ) {

					return( row.doc )

				}
			);
			console.table( docs );

		}

	</script>

</body>
</html>

As you can see, each update operator() attempts to append "( UPDATED )" to the name of each Apple. Of course, this will result in conflicts which will have to be retried. After several attempts, however, we can see that the full collection of documents has been updated:

Performing retry operations in the updateMany() bulk update plugin in PouchDB.

As you can see, all of the documents were updated; but, it took 2 retries to process all three documents (due to the artificial conflicts I was creating). In the video, you can see how changing the number of retries affects the results.

Getting retries to work with bulk updates in PouchDB was quite a challenge. Not only was it difficult to match results to documents for the retry, thinking about recursion is always a mind-trip. But, I think I finally came up with an approach that works and won't fall into an infinite loop.

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel