Skip to main content
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Munich) with: Christian Etbauer
Ben Nadel at Scotch On The Rock (SOTR) 2010 (Munich) with: Christian Etbauer

Extracting State Management From A Circuit Breaker In ColdFusion

By on
Tags:

In my previous exploration of Circuit Breakers in ColdFusion, the Circuit Breaker itself controlled both the state of the circuit and the marshaling of requests. In subsequent reading, however, I've seen a number of approaches in which the state of the Circuit Breaker was factored-out into its own entity; then, that entity was injected into the Circuit Breaker as a sort of "behavior" or "strategy" for state management. At first, this seemed like an artificial separation. But, the more I've played with it, the more interesting it becomes. And, of course, it makes me question my ability to think correctly about Object design.

When I started to try and tease state management out of the Circuit Breaker, I didn't really know what I was trying to do. I found it very frustrating to decouple the concept of state management from the concept of state consumption. After all, the two concepts seemed inextricably combined under a single goal.

But, as I continued to pick things apart, I started to like what I was seeing. Part of factoring the state management out of the Circuit Breaker meant that I had to create meaningful method names for state consumption. These method names had to describe what the state represented rather than raw the data that the state aggregated.

For example, as I was refactoring, I didn't want the Circuit Breaker to think about the number of active requests, I wanted the circuit breaker to think about whether or not the circuit was "at capacity". And, I didn't want the Circuit Breaker to think about incrementing a decrementing counters; rather, I wanted the Circuit Breaker to think about tracking successful and failed actions.

I tried to boil the state management API down into methods that told the "story" of the Circuit Breaker. Here's what I came up with:

  • isOpened()
  • isClosed()
  • canPerformHealthCheck()
  • trackRequestStart()
  • trackRequestSuccess()
  • trackRequestFailure()

As you can see, this list of methods doesn't concern itself with implementation details. It doesn't know anything about error rates or timeouts or windows within which to collect request statistics or when a circuit should open or close. The Circuit Breaker simply consumes the state in a way that helps it make decisions about marshaling requested actions. And, when we do this, the Circuit Breaker method that handles action fulfillment becomes relatively simple:

/**
* I proxy the execution / invocation of the given action.
*
* @target I am the function or component being executed.
* @methodName I am the message being sent to the target (if it's a component).
* @methodArguments I am the message arguments being sent to the target (if it's a component).
* @output false
*/
public any function run(
	required any target,
	string methodName,
	any methodArguments
	) {

	// CAUTION: Since the Circuit Breaker is expecting to handle many concurrent
	// requests, all reading-from and writing-to the shared state of the Circuit
	// Breaker is being SYNCHRONIZED with exclusive locking. The state object
	// itself does not perform any inherent locking.

	lock attributeCollection = lockAttributes {

		if ( state.isOpened() ) {

			// If the Circuit Breaker is open, the general idea is to "fail fast."
			// However, if the circuit has been open for some period of time, it
			// might be ready to send a health check request to the target to see
			// if the target has become healthy.
			if ( ! state.canPerformHealthCheck() ) {

				throw(
					type = "CircuitBreakerOpen",
					message = "Target invocation failing fast due to open circuit breaker.",
					detail = "The circuit is open and therefore the requested action could not be executed.",
					extendedInfo = state.getSummary()
				);

			}

		}

		state.trackRequestStart();

	} // END: Lock.

	try {

		// Try to execute the requested action.
		var result = ( isClosure( target ) || isCustomFunction( target ) )
			? target()
			: invoke( target, methodName, methodArguments )
		;

		lock attributeCollection = lockAttributes {

			state.trackRequestSuccess();

		} // END: Lock.

		// The target method may not return a defined value, even in a successful
		// invocation. As such, we have to check to see if the result exists before
		// we try to return the result upstream.
		if ( structKeyExists( local, "result" ) ) {

			return( result );

		} else {

			return; // void.

		}

	} catch ( any error ) {

		lock attributeCollection = lockAttributes {

			state.trackRequestFailure();

		} // END: Lock.

		rethrow;

	} // END: Catch.

}

As you can see, the Circuit Breaker does little more than ask the state "strategy" if a request can be fulfilled; then, if it can, it reports the fulfillment as either having completed in success or in error.

The interesting side-effect of this kind of separation of concerns is that you can now use the same Circuit Breaker with alternate implementations of the state strategy. For example, my implementation (shown below) takes into account both errors and active requests. But, maybe you don't care about active requests. In such a case, you can create a state strategy that only tracks errors. Or, maybe you want to report state changes to some external monitoring system - no problem, just create a state strategy that logs state transitions. You could even create a state strategy in which the state was stored in Redis and consumed by multiple servers (although that would seem like an idea very much counter to the concept of a Circuit Breaker as different machines are necessarily on different "circuits").

Anyway, here's my current state strategy implementation:

component
	output="false"
	hint = "I provide the NON-SYNCHRONIZED state for a Circuit Breaker instance."
	{

	/**
	* I initialize the Circuit Breaker State strategy. This state component is meant
	* to help drive the control flow of a Circuit Breaker.
	*
	* @failedRequestThreshold I am the number of requests that can fail before the circuit is opened.
	* @activeRequestThreshold I am the number of parallel requests that can be concurrently active before the circuit is opened.
	* @openStateTimeout I am the time (in milliseconds) that the circuit will remain open until the target is tested for health.
	* @output false
	*/
	public any function init(
		numeric failedRequestThreshold = 10,
		numeric activeRequestThreshold = 10,
		numeric openStateTimeout = ( 60 * 1000 )
		) {

		// Store the properties.
		variables.failedRequestThreshold = arguments.failedRequestThreshold;
		variables.activeRequestThreshold = arguments.activeRequestThreshold;
		variables.openStateTimeout = arguments.openStateTimeout;

		// NOTE: There is no "half-open" state. The half-open pseudo-state will be
		// entered into by a single request in which a full state change isn't necessary.
		states = {
			CLOSED: "CLOSED",
			OPENED: "OPENED"
		};

		// Default to a closed (ie, flowing) state.
		state = states.CLOSED;

		// Initialize the counters.
		activeRequestCount = 0;
		failedRequestCount = 0;

		// Initialize the timers - each of these store UTC millisecond values.
		checkTargetHealthAtTick = 0;
		lastFailedRequestAtTick = 0;

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I determine if a health check can be initiated against the target.
	*
	* @output false
	*/
	public boolean function canPerformHealthCheck() {

		return( ! isAtCapacity() && ! isWaitingForTargetToRecover() );

	}


	/**
	* I return a summary of the state of the Circuit Breaker. This can be used for logging
	* and debugging purposes.
	*
	* @output false
	*/
	public string function getSummary() {

		return(
			( isOpened() ? "State: OPENED, " : "State: CLOSED, " ) &
			"Active request count: [#activeRequestCount#], " &
			"Failed request count: [#failedRequestCount#]."
		);

	}


	/**
	* I determine if the Circuit Breaker is currently closed and can accept requests.
	*
	* @output false
	*/
	public boolean function isClosed() {

		return( state != states.OPENED );

	}


	/**
	* I determine if the Circuit Breaker is closed and cannot currently accept any requests.
	*
	* @output false
	*/
	public boolean function isOpened() {

		return( state == states.OPENED );

	}


	/**
	* I reset the Circuit Breaker State, rolling back all counters and timers to a
	* healthy state.
	*
	* @output false
	*/
	public boolean function reset() {

		// Revert to a closed (ie, flowing) state.
		state = states.CLOSED;

		// Reset the counters.
		activeRequestCount = 0;
		failedRequestCount = 0;

		// Reset the timers.
		checkTargetHealthAtTick = 0;
		lastFailedRequestAtTick = 0;

	}


	/**
	* I track a failed action in the Circuit Breaker.
	*
	* @output false
	*/
	public void function trackRequestFailure() {

		activeRequestCount--;

		// Check to see if the current failure count is still relevant. Since we are
		// tracking errors in a rolling window, it might be time to reset the count
		// before we track the current failure.
		if ( isClosed() && isNewErrorWindow() ) {

			failedRequestCount = 0;

		}

		failedRequestCount++;
		lastFailedRequestAtTick = getTickCount();

		// Check to see if the current failure exceeded the allowable failure rate for
		// the Circuit Breaker. If so, we'll have to trip it open.
		if ( isClosed() && isFailing() ) {

			state = states.OPENED;
			checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );

		}

	}


	/**
	* I track the start of an action in the Circuit Breaker. Every "start" should be
	* followed by either a completion in "success" or in "failure".
	*
	* @output false
	*/
	public void function trackRequestStart() {

		// If a request is being initiated while the circuit is tripped open, it must be
		// a health check. Since the ability to accept a health check is, in part, driven
		// by the open-state timeout, in order to prevent parallel requests from also
		// initiating a health check request, let's bump out the timer. This will also
		// implicitly "reset" the timeout, for all intents and purposes, if the health
		// check fails.
		if ( isOpened() ) {

			checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );

		}

		activeRequestCount++;

		// If the current request just exhausted the request pool, open the circuit so
		// no more requests can be initiated.
		if ( isClosed() && isAtCapacity() ) {

			state = states.OPENED;

			// NOTE: Since this "trip" is based on capacity and not on error rate, there
			// is no need to adjust the health-timer. We want the circuit to re-close as
			// pending requests complete.

		}

	}


	/**
	* I track a successful action in the Circuit Breaker.
	*
	* @output false
	*/
	public void function trackRequestSuccess() {

		activeRequestCount--;

		// Any successful request that returns while the Circuit Breaker is open will
		// move the circuit back into a closed, flowing state. This may be the "health
		// check" request; or, it may be a previously long-running request that finally
		// returned some time after the circuit was tripped open; or, it may be an
		// "at capacity" request that has completed, releasing a slot in the request
		// pool. At this point, there is no differentiating between the various types
		// of successful returns.
		if ( isOpened() && ! isAtCapacity() ) {

			state = states.CLOSED;

			// Reset failure tracking.
			failedRequestCount = 0;
			lastFailedRequestAtTick = 0;
			checkTargetHealthAtTick = 0;

		}

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I determine if the Circuit Breaker has exhausted its request pool and should no
	* longer accept any requests until pending requests have completed.
	*
	* @output false
	*/
	private boolean function isAtCapacity() {

		return( activeRequestCount >= activeRequestThreshold );

	}


	/**
	* I determine if the Circuit Breaker is failing based on the failed request threshold.
	*
	* @output false
	*/
	private boolean function isFailing() {

		return( failedRequestCount >= failedRequestThreshold );

	}


	/**
	* I determine if a new error-tracking window should be initiated. Errors are tracked
	* in a rolling window so that infrequent errors don't eventually trip the Circuit
	* Breaker unnecessarily.
	*
	* @output false
	*/
	private boolean function isNewErrorWindow() {

		return( lastFailedRequestAtTick < ( getTickCount() - openStateTimeout ) );

	}


	/**
	* I determine if the OPEN Circuit Breaker is currently waiting before attempting to
	* check the health of the target (ie, whether or not it is yet appropriate to check
	* the health of the target).
	*
	* @output false
	*/
	private boolean function isWaitingForTargetToRecover() {

		return( checkTargetHealthAtTick > getTickCount() );

	}

}

And, another very interesting byproduct of this separation of concerns is that the state management itself can now be Unit Tested directly. This makes it much easier to reason about the state of the state since we don't have to worry about mocking-out actions that succeed or fail or hold requests open:

<cfscript>

	// Create our Circuit Breaker State strategy.
	state = new CircuitBreakerState(
		failedRequestThreshold = 2,
		activeRequestThreshold = 3,
		openStateTimeout = ( 1 * 1000 )
	);

	// Create a Circuit Breaker with the given strategy.
	breaker = new CircuitBreaker( state );
	testGateway = new TestGateway();


	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //


	// Before we extracted the state, the only way to test the Circuit Breaker was to
	// make calls through it.
	result = breaker.executeMethod( testGateway, "makeGoodCall", [ "Meep!" ] );
	result = breaker.executeMethod( testGateway, "makeBadCall", [ "Meh" ], "Fallback value!" );


	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //


	// Now that we have the state extracted from the Circuit Breaker, we can actually
	// test the two completely independently. For example, we can adjust and monitor the
	// state directly.

	writeOutput( "Is state OPEN before making requests? #state.isOpened()# <br />" );

	// Fill up the request pool (can only have 3).
	state.trackRequestStart();
	state.trackRequestStart();
	state.trackRequestStart();

	writeOutput( "Is state OPEN now that pool is exhausted? #state.isOpened()# <br />" );

	// Now, close one of the requests, thereby closing the circuit.
	state.trackRequestSuccess();

	writeOutput( "Is state OPEN now request has returned? #state.isOpened()# <br />" );

	// Now, close remaining two requests in ERROR, pushing us over threshold, thereby
	// opening the circuit.
	state.trackRequestFailure();
	state.trackRequestFailure();

	writeOutput( "Is state OPEN due to error threshold? #state.isOpened()# <br />" );

	// Check to see if we can perform a health check immediately after the trip.
	writeOutput( "Can we perform health check? #state.canPerformHealthCheck()# <br />" );
	writeOutput( "#state.getSummary()# <br />" );

	// Sleep past the open-state timeout.
	sleep( 1100 );

	writeOutput( "Can we perform health check after sleeping? #state.canPerformHealthCheck()# <br />" );

	// Send a successful request - in an OPENED state, this will CLOSE the circuit.
	state.trackRequestStart();
	state.trackRequestSuccess();

	writeOutput( "Is state OPEN after health check request? #state.isOpened()# <br />" );

	// We can also test the Circuit Breaker after manipulating the state. Let's accumulate
	// a number of errors to push us past the threshold.
	state.trackRequestStart();
	state.trackRequestStart();
	state.trackRequestFailure();
	state.trackRequestFailure();

	writeOutput( "Is state OPEN after too many errors? #state.isOpened()# <br />" );

	result = result = breaker.executeMethod( testGateway, "makeGoodCall", [ "Meep!" ], "Fallback!" );

	writeOutput( "Was result a fallback value? #( result eq 'Fallback!' )# <br />" );

</cfscript>

As you can see, we can now test the Circuit Breaker with integration testing; or, we can monkey directly with the state strategy implementation and bypass the concept of actions and methods.

Now, ColdFusion is a "blocking" language. Meaning, if you make a method call, the parent thread will block until that method returns. As such, it's non-trivial for a Circuit Braker, implemented in ColdFusion, to become concerned with "timeouts". That's why I track concurrent active requests as a way to combat hanging requests (the thought being that concurrently hanging requests will exhaust the "request pool").

That said, if this were a non-blocking language (like Node.js), I am unsure as to how timeout control would be implemented. Since the Circuit Breaker is the entity marshaling requests, it would seem that the Circuit Breaker would have to be the entity that knows about timeouts. Perhaps it could read the timeout settings from the state strategy? I am not sure. Right now, I can only hope that some reasonable timeout usage is applied within the actions themselves (such as a timeout on any underlying HTTP request).

Here's the full implementation of my Circuit Breaker:

component
	output = false
	hint = "I marshal the invocation of actions, providing circuit-breaker protection with the given state strategy."
	{

	/**
	* I initialize the Circuit Breaker with the given state strategy.
	*
	* @state I am the Circuit Breaker State with which the Circuit Breaker is making decisions.
	* @output false
	*/
	public any function init( required any circuitBreakerState ) {

		// Store the properties.
		state = circuitBreakerState;

		// All access to the shared state of the circuit breaker will be synchronized
		// using these locking properties. The state itself is not inherently synchronized.
		lockAttributes = {
			name: "CircuitBreaker-#createUUID()#",
			type: "exclusive",
			timeout: 1,
			throwOnTimeout: true
		};

		return( this );

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I marshal the given action inside the Circuit Breaker.
	*
	* @target I am the function or closure to be invoked.
	* @fallback I am the value to be evaluated if the action fails to complete successfully.
	* @output false
	*/
	public any function execute(
		required any target,
		any fallback
		) {

		try {

			return( run( target ) );

		} catch ( any error ) {

			// If a fallback has been provided, return the fallback instead of letting
			// the error propagate to the calling context.
			if ( structKeyExists( arguments, "fallback" ) ) {

				return( evaluateFallback( fallback ) );

			}

			rethrow;

		}

	}


	/**
	* I marshal the given action inside the Circuit Breaker.
	*
	* @target I am the component receiving the message.
	* @methodName I am the message being sent to the target.
	* @methodArguments I am the message arguments being sent to the target.
	* @fallback I am the value to be evaluated if the action fails to complete successfully.
	* @output false
	*/
	public any function executeMethod(
		required any target,
		required string methodName,
		any methodArguments = [],
		any fallback
		) {

		try {

			return( run( target, methodName, methodArguments ) );

		} catch ( any error ) {

			// If a fallback has been provided, return the fallback instead of letting
			// the error propagate to the calling context.
			if ( structKeyExists( arguments, "fallback" ) ) {

				return( evaluateFallback( fallback ) );

			}

			rethrow;

		}

	}


	/**
	* I determine if the Circuit Breaker is in a closed state.
	*
	* @output false
	*/
	public boolean function isClosed() {

		return( state.isClosed() );

	}


	/**
	* I determine if the Circuit Breaker is in an open state.
	*
	* @output false
	*/
	public boolean function isOpen() {

		return( state.isOpened() );

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I evaluate the given fallback input to produce an output. If the fallback is a
	* function or closure, it will be invoked; otherwise, it will be returned as-is.
	*
	* @fallback I am the fallback producer being evaluated.
	* @output false
	*/
	private any function evaluateFallback( required any fallback ) {

		if ( isCustomFunction( fallback ) || isClosure( fallback ) ) {

			return( fallback() );

		} else {

			return( fallback );

		}

	}


	/**
	* I proxy the execution / invocation of the given action.
	*
	* @target I am the function or component being executed.
	* @methodName I am the message being sent to the target (if it's a component).
	* @methodArguments I am the message arguments being sent to the target (if it's a component).
	* @output false
	*/
	public any function run(
		required any target,
		string methodName,
		any methodArguments
		) {

		// CAUTION: Since the Circuit Breaker is expecting to handle many concurrent
		// requests, all reading-from and writing-to the shared state of the Circuit
		// Breaker is being SYNCHRONIZED with exclusive locking. The state object
		// itself does not perform any inherent locking.

		lock attributeCollection = lockAttributes {

			if ( state.isOpened() ) {

				// If the Circuit Breaker is open, the general idea is to "fail fast."
				// However, if the circuit has been open for some period of time, it
				// might be ready to send a health check request to the target to see
				// if the target has become healthy.
				if ( ! state.canPerformHealthCheck() ) {

					throw(
						type = "CircuitBreakerOpen",
						message = "Target invocation failing fast due to open circuit breaker.",
						detail = "The circuit is open and therefore the requested action could not be executed.",
						extendedInfo = state.getSummary()
					);

				}

			}

			state.trackRequestStart();

		} // END: Lock.

		try {

			// Try to execute the requested action.
			var result = ( isClosure( target ) || isCustomFunction( target ) )
				? target()
				: invoke( target, methodName, methodArguments )
			;

			lock attributeCollection = lockAttributes {

				state.trackRequestSuccess();

			} // END: Lock.

			// The target method may not return a defined value, even in a successful
			// invocation. As such, we have to check to see if the result exists before
			// we try to return the result upstream.
			if ( structKeyExists( local, "result" ) ) {

				return( result );

			} else {

				return; // void.

			}

		} catch ( any error ) {

			lock attributeCollection = lockAttributes {

				state.trackRequestFailure();

			} // END: Lock.

			rethrow;

		} // END: Catch.

	}

}

Circuit Breakers are immensely fascinating. Not only is this helping me to think about decoupling systems safely, it's also helping me think about Object Design. If I hadn't seen other implementations of Circuit Breakers in other languages, I am not sure that it would have ever occurred to me to break state management out into its own entity. But, now that I've seen it done, I think it has a lot of benefits from both a clarity and unit testing standpoint. Not to mention that it makes the overal concept a lot more flexible, capable of being driven by various implementations.

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel