Experimenting With Circuit Breakers In Decoupled Systems Using ColdFusion

By Ben Nadel on October 9, 2016

UPDATE / WARNING: So, I've been thinking a lot about this. And, after having some excellent conversation with some engineers at work (tip of the hat to David Bainbridge and Jesse Dearing), I realize that my implementation is flawed. This is the first time that I've ever thought about Circuit Breakers; and, at first, it seemed like it was a great feature to be able to transparently pass-around a wrapped object (in a Circuit Breaker proxy). But, this goes against the whole concept of an object interface. See, the benefit of a Circuit Breaker is not just that it can fail-fast - it's that the calling context can react explicitly to fail-fast type errors. And, the only way that the calling context can do this is by knowing that the object it is sending messages to is capable of throwing fail-fast errors. As such, a transparent proxy is really only half a solution. A robust solution is one in which the Circuit Breaker is a first class citizen in the message-passing workflow.

--- ORIGINAL POST ---

At InVision App, we're continually working to break our monolithic ColdFusion application down into a collection of independent microservices. With each microservice that we decouple, we introduce new opportunities for latency and new points of failure. This is simply the cost of making the application more robust and more available overall; at least when done correctly. And, part of doing it correctly means protecting each "seam" in the application from the stresses and failures of systems downstream from it. One way to protect a system is to wrap remote calls in a "Circuit Breaker" - a mechanism that will "fail fast" when the remote system becomes unhealthy.

I am very new to the concept of Circuit Breakers; but, it's a concept that has been around for a while. Popularized by Michael Nygard in the book Release It: Design and Deploy Production-Ready Software, the topic of Circuit Breakers has also been discussed by thought-leaders like Martin Fowler on his bliki and Sam Newman in Building Microservices (Designing Fine-Grained Systems). The way that I understand it, a Circuit Breaker is essentially a proxy to a remote-client that marshalls all calls going to that client. By marshaling outbound requests, the Circuit Breaker can monitor the performance of the request-response life-cycle and halt outbound calls when the remote system stops responding in a timely manner.

Since InVision's monolith is written in ColdFusion, I thought it would be a worthwhile experiment to try and create a Circuit Breaker in ColdFusion. Luckily, ColdFusion is a super powerful language and provides quite a bit of meta-programming functionality. As such, we can create a generic Circuit Breaker that can dynamically proxy any ColdFusion component that exposes public methods:

A circuit breaker proxies a component that communicates with a remote or untrusted system.

Dynamically routing "messages" from one component to an origin component can be done with something like ColdFusion's onMissingMethod() event handler. However, since we will know the landscape of the origin component at instantiation time, I think it will probably be more efficient to just stamp out "stubs" of the origin methods, and have those stubs all point to the same "poxy method" which can perform dynamic method routing using getFunctionCalledName().

Internally, this proxy method can then marshal all calls to the origin component, implementing rules about failure thresholds and pending thresholds. If the origin component appears to be unhealthy, the proxy can "fail fast"; meaning, it will immediately throw an error rather than trying to communicate with the origin component. While this "fail fast" mentality may seem counterintuitive, it's actually much safer to "fail fast" than to "fail slow" - failing slow can be devastating to a distributed system.

Again, I should say that I am very new to the concept of Circuit Breaks; so, I have tried to keep this exploration fairly simple. That said, here's my generic CircuitBreaker.cfc ColdFusion component:

component
	output = false
	hint = "I proxy a ColdFusion component, providing circuit-breaker protection."
	{

	/**
	* I initialize the CircuitBreaker with the given origin and options.
	*
	* @origin I am the target component being proxied.
	* @failedRequestThreshold I am the number of requests that can fail before the circuit it opened.
	* @activeRequestThreshold I am the number of parallel requests that can be concurrently active before the circuit is opened.
	* @openStateTimeout I am the time (in milliseconds) that the circuit will remain open until the origin is tested.
	* @output false
	*/
	public any function init(
		required any origin,
		numeric failedRequestThreshold = 10,
		numeric activeRequestThreshold = 10,
		numeric openStateTimeout = ( 60 * 1000 )
		) {

		// Store the properties.
		variables.origin = arguments.origin;
		variables.failedRequestThreshold = arguments.failedRequestThreshold;
		variables.activeRequestThreshold = arguments.activeRequestThreshold;
		variables.openStateTimeout = arguments.openStateTimeout;

		originName = getOriginName();
		originMethodNames = getOriginMethodNames();

		// Generate all of the public method on THIS component that will marshal requests
		// to the underlying origin component.
		generateProxyMethods();

		// NOTE: There is no "half-open" state. The half-open pseudo state will be entered
		// by a single request in which a true state change isn't necessary.
		states = {
			CLOSED: "CLOSED",
			OPENED: "OPENED"
		};

		// Default to a closed (ie, flowing) state.
		state = states.CLOSED;

		// Reset the counters.
		activeRequestCount = 0;
		failedRequestCount = 0;

		// Reset the timers - each of these store millisecond values.
		checkOriginHealthAtTick = 0;
		lastFailedRequestAtTick = 0;

		// All access to the shared state of the circuit breaker will be synchronized
		// using this lock name. Errors are aggregated across all method calls to the
		// origin. Meaning, unique methods are not tracked individually.
		lockName = "CircuitBreaker-#originName#-#createUUID()#";

		return( this );

	}


	// ---
	// PUBLIC METHODS.
	// ---


	// NOTE: The only PUBLIC methods on the circuit breaker are the proxy methods that
	// marshal requests to the origin component. Any attempt to define additional public
	// methods would run the risk of colliding with (ie, overwriting) existing public
	// methods on the origin component. As such, the state of the circuit break is
	// basically a black-box to any external context (unless errors are being thrown).


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I proxy all the calls to the origin component, checking and updating the state of
	* the circuit break with each request.
	*
	* CAUTION: We are explicitly omitting any "output" setting for this method so as to
	* allow the output of the underlying methods to "bubble up" if need-be.
	*/
	private any function __proxy_method__() {

		var methodName = getFunctionCalledName();
		var methodArguments = arguments;

		// CAUTION: All reading-from and writing-to the shared state of the circuit
		// breaker is being SYNCHRONIZED with exclusive locking. While this does incur
		// some overhead, no heavy processing should being done inside the locks. As such
		// the duration of any lock should be negligible. Each request has two locks:
		// one before the origin invocation to test state and one after the origin
		// invocation to clean up state.

		lock
			name = lockName
			type = "exclusive"
			timeout = 1
			throwOnTimeout = true
			{

			var currentTick = getTickCount();

			// If the circuit breaker is currently closed (ie, flowing), check to see if
			// we're about to go over the active request threshold.
			if ( ( state == states.CLOSED ) && ( activeRequestCount == activeRequestThreshold ) ) {

				// There are too many concurrent requests still pending a response from
				// the origin - trip the breaker and open the circuit.
				state = states.OPENED;

				// Keep the breaker open until some time in the future.
				checkOriginHealthAtTick = ( currentTick + openStateTimeout );

			}

			// If the circuit breaker is currently open (ie, not flowing), then we either
			// want to fail-fast or perform a single test on the origin to see if we can
			// close (ie, allow flow) on the circuit.
			if ( state == states.OPENED ) {

				// If we are currently in the opened-timeout, fail fast.
				if ( currentTick < checkOriginHealthAtTick ) {

					throw(
						type = "CircuitBreaker.Open",
						message = "Method invocation failing fast due to open circuit breaker.",
						detail = "The circuit is open and therefore the origin method [#originName# :: #methodName#()] is not reachable."
						extendedInfo = "Active request count: [#activeRequestCount#], Failed request count: [#failedRequestCount#], Testing health in [#( checkOriginHealthAtTick - currentTick )#]."
					);

				}

				// If we made it this far, the circuit break is open; but, we want to
				// allow a single test (the current request) to be run against the origin
				// in order to see if the origin has reached a healthy state (the circuit
				// can be closed again). To make sure that no parallel requests try to
				// perform the same test, push out the timeout.
				checkOriginHealthAtTick = ( currentTick + openStateTimeout );

			}

			activeRequestCount++;

		} // END: Lock.

		try {

			// Try to invoke the origin method on the downstream system.
			var result = invoke( origin, methodName, methodArguments );

			lock
				name = lockName
				type = "exclusive"
				timeout = 1
				throwOnTimeout = true
				{

				activeRequestCount--;

				// If we made it this far, it means that the origin method invocation has
				// completed successfully. As such, we can clean up any opened state as
				// long as the open state is not being held open by active requests.
				if ( ( state == states.OPENED ) && ( activeRequestCount <= activeRequestThreshold ) ) {

					state = states.CLOSED;

					// Now that we received a healthy response from the origin, let's
					// reset the failure count
					failedRequestCount = 0;

				}

			} // END: Lock.

			// The origin method may not return a defined value, even in a successful
			// invocation. As such, we have to check to see if the result exists before
			// we try to return the result upstream.
			if ( structKeyExists( local, "result" ) ) {

				return( result );

			} else {

				return; // void.

			}

		// Catch any errors thrown by origin invocation.
		} catch ( any error ) {

			lock
				name = lockName
				type = "exclusive"
				timeout = 1
				throwOnTimeout = true
				{

				activeRequestCount--;

				currentTick = getTickCount();

				// If the last error occurred in the distant past (ie, a time greater
				// than the open-state timeout), reset the error count before we record
				// the current failure.
				if ( ( currentTick - openStateTimeout ) > lastFailedRequestAtTick ) {

					failedRequestCount = 0;

				}

				lastFailedRequestAtTick = currentTick;

				// If we made it here, the invocation of the origin method failed (ie,
				// threw an error); as such, we need to check to see if this failure
				// pushed us past the failure capacity of the circuit breaker.
				if ( ++failedRequestCount > failedRequestThreshold ) {

					state = states.OPENED;

					// Keep the breaker open until some time in the future.
					checkOriginHealthAtTick = ( currentTick + openStateTimeout );

				}

			} // END: Lock.

			rethrow;

		}

	}


	/**
	* I generate the proxy methods for the circuit breaker based on the public methods
	* defined on the origin - each public method gets a proxy in the circuit breaker.
	*
	* @output false
	*/
	private void function generateProxyMethods() {

		for ( var methodName in originMethodNames ) {

			// Create a public method on the circuit breaker with the same name. All
			// stubs point to the same method since the method is generic and will route
			// the request to the origin based on the name of the method at invocation
			// time.
			this[ methodName ] = __proxy_method__;

		}

	}


	/**
	* I return the names of public methods exposed on the given origin.
	*
	* @output false
	*/
	private array function getOriginMethodNames() {

		var methodNames = [];

		for ( var key in origin ) {

			// CAUTION: Using structKeyExists() for edge-case in which the origin has
			// keys configured with undefined values.
			if ( structKeyExists( origin, key ) && isCustomFunction( origin[ key ] ) ) {

				arrayAppend( methodNames, key );

			}

		}

		return( methodNames );

	}


	/**
	* I get the name of the origin component.
	*
	* @output false
	*/
	private string function getOriginName() {

		return( getMetaData( origin ).name );

	}

}

As you can see, there are no public methods in the Circuit Breaker. Since the Circuit Breaker is intended to transparently proxy another component, any public method runs the risk of colliding with a public method on the origin component. In this Circuit Breaker, all public methods are references to the private "__proxy_method__()" method, which dynamically routes the request to the origin component based on the name of the method at invocation time.

Because the concept of the health of the origin component is aggregated across all public methods (on the origin), each calls reads from and writes to shared state. In order to cut down on race conditions, I am synchronizing access to the shared state with the CFLock tag. This obviously has some overhead to it - all locking does. But, the contents of the lock deal exclusively with in-memory data structures; so, the performance overhead should be minimal.

To test my CircuitBreaker.cfc, I created a simple test Gateway that has methods that succeed and fail, some with an optional "sleep()" command to allow the testing of concurrent requests:

component
	output = false
	hint = "I provide a gateway for interfacing with static test data."
	{

	/**
	* I initialize the test gateway with the given per-method sleep duration.
	*
	* @sleepInMilliseconds I am the [optional] sleep duration to perform in each method.
	* @output false
	*/
	public any function init( numeric sleepInMilliseconds = 0 ) {

		sleepDuration = sleepInMilliseconds;

		return( this );

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I provide a test method that throws an error.
	*
	* @input I am the input being tested.
	* @output false
	*/
	public void function makeBadCall( required string input ) {

		throw( "BadCall" );

	}


	/**
	* I provide a test method that throws an error after a sleep.
	*
	* @input I am the input being tested.
	* @output false
	*/
	public void function makeBadCallWithSleep( required string input ) {

		if ( sleepDuration ) {

			sleep( sleepDuration );

		}

		throw( "BadCall" );

	}


	/**
	* I provide a test method that echoes back the request.
	*
	* @input I am the input being tested.
	* @output false
	*/
	public string function makeGoodCall( required string input ) {

		return( input );

	}


	/**
	* I provide a test method that echoes back the request after a sleep.
	*
	* @input I am the input being tested.
	* @output false
	*/
	public string function makeGoodCallWithSleep( required string input ) {

		if ( sleepDuration ) {

			sleep( sleepDuration );

		}

		return( input );

	}

}

To see the Circuit Breaker in action, we can wrap it around the TestGateway.cfc and then make a bunch of [potentially] concurrent calls to the proxy using CFThread. Notice that my proxied Gateway is exposing all of the same public methods that the TestGateway.cfc exposed.

<cfscript>

	// Create a test gateway with a sleep-option of 1,000ms.
	testGateway = new TestGateway( 1000 );

	// Wrap the test gateway in a circuit breaker.
	testGatewayCB = new CircuitBreaker(
		origin = testGateway,
		failedRequestThreshold = 8,
		activeRequestThreshold = 12,
		openStateTimeout = ( 2 * 1000 )
	);


	// Spawn a bunch of threads to test the concurrent access to the gateway.
	// --
	// CAUTION: Remember, threads are not necessarily executed in the order in which
	// they are initialized.
	for ( i = 1 ; i < 100 ; i++ ) {

		thread
			name = "thread-#i#"
			testIndex = i
			{

			// If we are less than 20, make a good call with a sleep. This should cause
			// the active requests to pile-up, surpassing the 12-active threshold.
			if ( testIndex < 20 ) {

				testGatewayCB.makeGoodCallWithSleep( "Called with Index #testIndex#." );

			// If we're less than 70, start making bad calls. This should cause the
			// failed requests to pile-up, surpassing the 8-failed threshold.
			// --
			// NOTE: We are sleep()ing at i==35.
			} else if ( testIndex <= 70 ) {

				testGatewayCB.makeGoodCall( "Called with Index #testIndex#." );

			// If we're greater than 70, start making bad requests. This should cause the
			// failed requests to pile-up again, surpassing the 8-failed threshold.
			} else if ( testIndex > 70 ) {

				testGatewayCB.makeBadCall( "Called with Index #testIndex#." );

			}

			// If we made it this far, the gateway call didn't fail.
			writeOutput( "Success." );

		}


		// Occasionally sleep the parent page to let some of the circuit break timers
		// get reset.
		if ( ( i == 35 ) || ( i == 85 ) ) {

			sleep( 3500 );

		}

	}


	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //


	// Wait for the threads to re-join and then output the states.
	thread action = "join";

	for ( i = 1 ; i < 100 ; i++ ) {

		targetThread = cfthread[ "thread-#i#" ];

		writeOutput( "#targetThread.name#:" );

		structKeyExists( targetThread, "error" )
			? writeOutput( "#targetThread.error.extendedInfo#<br >" )
			: writeOutput( "#targetThread.output#<br >" )
		;

	}

</cfscript>

The output of this test is kind of hard to decipher; so, I would just suggest watching the video (where I'm sure that I do a fairly mediocre job of explaining it - noob fail).

I find the idea of a Circuit Breaker completely fascinating. It is both simple and deceptively complex. For example, what is the right threshold for failures? And, once you hit that threshold, how long should you wait before you check the origin's health? I have zero gut instinct for this; and, I'm not sure how you would come up with meaningful numbers for this kind of stuff (that isn't just a whole bunch of trial-and-error). It's great to see how ColdFusion can implement it, from a technical standpont; but, there's much more that I need to understand from a philosophical standpoint.

Want to use code from this post? Check out the license.

Short link: https://bennadel.com/3160

Reader Comments

Ben Nadel Oct 10, 2016 at 6:10 AM

15,688 Comments

@All,

Just a bit of clarification on the "activeRequestCount".

When I read about the circuit breaker pattern, there is often a "timeout" associated with each request. However, in ColdFusion, dealing with that level of asynchronisity is not exactly straightforward. As such, rather than trying to determine the health based on any single long-running request, I am using the number of parallel requests. The thinking here is that the longer requests take to run, the more requests will start to pile up in parallel. So, long running requests will naturally lead to a larger number of parallel requests, which will indicate a potentially unhealthy system.

Baba Oct 10, 2016 at 9:29 AM

5 Comments

Ben,

Great approach. I would also recommend check RxJava to use with ColdFusion for this pattern implementation. i did play with little bit. i think that would work better for this approach. I might be totally wrong too.

Thanks

Ben Nadel Oct 12, 2016 at 12:14 PM

15,688 Comments

@All,

After some conversations at work, I've changed my mind a bit on the implementation here. I had some flaws in my thinking and what I saw as a feature. I added this to the top of the post as a caveat / warning.

=====
UPDATE / WARNING: So, I've been thinking a lot about this. And, after having some excellent conversation with some engineers at work (tip of the hat to David Bainbridge and Jesse Dearing), I realize that my implementation is flawed. This is the first time that I've ever thought about Circuit Breakers; and, at first, it seemed like it was a great feature to be able to transparently pass-around a wrapped object (in a Circuit Breaker proxy). But, this goes against the whole concept of an object interface. See, the benefit of a Circuit Breaker is not just that it can fail-fast - it's that the calling context can react explicitly to fail-fast type errors. And, the only way that the calling context can do this is by knowing that the object it is sending messages to is capable of throwing fail-fast errors. As such, a transparent proxy is really only half a solution. A robust solution is one in which the Circuit Breaker is a first class citizen in the message-passing workflow.
====

Ben Nadel Oct 12, 2016 at 12:16 PM

15,688 Comments

@Baba,

I'm fairly new to the RxJS; will take a look at RxJava - all of that stuff seems very interesting!

Ben Nadel Nov 26, 2016 at 11:43 AM

15,688 Comments

@All,

I finally took all of my noodling on the concept of Circuit Breakers and turned it into a GitHub project. While it's not the end of the journey, this forced me to clean it up and add unit tests:

www.bennadel.com/blog/3190-coldfusion-circuit-breaker-project-on-github.htm

Now, I'll have a more directed way to continue evolving my understanding of the concept.

Oh my chickens, this post is old!

Hit me up on Twitter if you want to discuss it further.