Skip to main content
Ben Nadel at the New York ColdFusion User Group (May. 2009) with: Gert Franz
Ben Nadel at the New York ColdFusion User Group (May. 2009) with: Gert Franz ( @gert_railo )

Tracking Request Resolution Within A Circuit Breaker In ColdFusion

By on
Tags:

In the last few weeks, I've had a lot of fun noodling on Circuit Breaker architectures in ColdFusion. And, while I've thought about monitoring the state of a Circuit Breaker, one thing that never sat right in my mind was the disconnection between a request initiation and its subsequent resolution. In my previous approaches, I tracked the start of a request and the end of a request; but, due to the parallel nature of requests, I couldn't link a failed or successful outcome to any particular initiation. So, I wanted to quickly revisit the Circuit Breaker and try to connect the two phases of tracking.

The way I've been approaching Circuit Breakers in ColdFusion is to split the request routing and the state management. The benefit of this approach is that various "state implementations" - or "strategies" - could be plugged into the same Circuit Breaker as long as the implementations all adhered to a standard State interface. The tradeoff is that the state management knows nothing about how the requests are being routed; and, as such, it can't easily track meta-data about the request.

To bridge this gap, while keeping the Circuit Breaker generic, I decided to force the state management implementation to provide some sort of request Token. The Circuit Breaker won't care what kind of token it is - string, number, UUID, struct, etc. - only that it is a non-void value. When a request is routed, the Circuit Breaker will then pass this token back into the resolution handlers (pseudo code):

public any function run( /* ... */ ) {

	var requestToken = state.trackRequestStart();

	try {

		// ... invoke action ...

		state.trackRequestSuccess( requestToken );

	} catch ( any error ) {

		state.trackRequestFailure( requestToken, error );

	}

}

As you can see, when the Circuit Breaker tells the state implementation to start tracking the request, the state implementation returns a token to associate with that request. Then, when the action is evaluated, that tracking token is passed back into either the trackRequestSuccess() or the trackRequestFailure() methods. With this data being pulled-through the life of the request, the state implementation can now track any arbitrary meta-data that it wants without coupling the Circuit Breaker to any particular state implementation.

Now, when I went to test this approach, I started to struggle with how I wanted to record something like the "duration" of each action. The problem was, once I had the data about the duration, I had to pass it to the "Monitor". But, my previous implementation of the state monitor didn't accept any method arguments. So, I stared at the screen for half-an-hour trying to think about a way to keep the Monitor "generic" while allowing it to track new data.

Then, it occurred to me! The Monitor doesn't need to be generic. The state monitor is tied to a state implementation. And, the state implementation just needs to adhere to an interface. So, if the particular state implementation needs to track new information, its monitor implementation can change along with it. The two are not abstract companions; but, rather, co-evolving interfaces.

With that mental battle concluded, I updated my state management implementation to provide the current tick-count as the request token. This way, when each request was resolved, I could calculate the delta in time:

public any function trackRequestStart() {

	// ...

	// Each request needs to be associated with some type of token (that will be
	// passed back into the success / failure tracking methods). In this case,
	// we're just going to use the current UTC milliseconds so that we can roughly
	// correlate each request with a duration.
	return( getTickCount() );

}

public void function trackRequestFailure(
	required any requestToken,
	required any error
	) {

	// ...
	logFailure( getTickCount() - requestToken );
	// ...

}

public void function trackRequestSuccess( required any requestToken ) {

	// ...
	logSuccess( getTickCount() - requestToken );
	// ...

}

In this case, I'm using the tick-count as the request token; but, this could be anything.

To test this, I updated my in-memory Circuit Breaker monitor to accept durations (in milliseconds). Then, I threw some actions at it:

<cfscript>

	// Create an instance of our in-memory monitor. This will keep track of the Circuit
	// Breaker events in an in-memory event log.
	monitor = new InMemoryCircuitBreakerMonitor();

	// Pass the in-memory monitor instance to our Circuit Breaker state.
	state = new DefaultCircuitBreakerState(
		failedRequestThreshold = 2,
		activeRequestThreshold = 2,
		openStateTimeout = 1000,
		monitor = monitor
	);

	breaker = new CircuitBreaker( state );

	// ------------------------------------------------------------------------------- //
	// ------------------------------------------------------------------------------- //

	// Execute quick success.
	breaker.execute(
		function() {

			return( "Woot!" );

		}
	);

	// Execute slow success.
	breaker.execute(
		function() {

			sleep( 52 );
			return( "Wooty!" );

		}
	);

	// Execute quick failure.
	breaker.execute(
		function() {

			throw( "meh" );

		},
		"I am a fallback value"
	);

	// Execute slow failure.
	breaker.execute(
		function() {

			sleep( 38 );
			throw( "meh" );

		},
		"I am a fallback value"
	);

	// Log-out the events monitored by the Circuit Breaker state.
	writeDump( var = monitor.getEvents(), format = "text" );

</cfscript>

As you can see, some of these actions have sleep() calls in them to make sure that they take exhibit a non-zero duration. And, when we run the above test code, we get the following output:

array - Top 5 of 5 rows

1) Action executed in success [in 0 ms].
2) Action executed in success [in 57 ms].
3) Action executed in failure [in 1 ms].
4) Action executed in failure [in 39 ms].
5) Circuit breaker moved to OPENED state.

As you can see, the duration of all requests flowing through the Circuit Breaker were properly recorded.

Here's my current implementation of the CircuitBreaker.cfc:

component
	output = false
	hint = "I marshal the invocation of actions, providing circuit-breaker protection with the given state strategy."
	{

	/**
	* I initialize the Circuit Breaker with the given state strategy.
	*
	* @state I am the Circuit Breaker State with which the Circuit Breaker is making decisions.
	* @output false
	*/
	public any function init( required ICircuitBreakerState circuitBreakerState ) {

		// Store the properties.
		state = circuitBreakerState;

		// All access to the shared state of the circuit breaker will be synchronized
		// using these locking properties. The state itself is not inherently synchronized.
		lockAttributes = {
			name: "CircuitBreaker-#createUUID()#",
			type: "exclusive",
			timeout: 1,
			throwOnTimeout: true
		};

		return( this );

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I marshal the given action inside the Circuit Breaker.
	*
	* @target I am the function or closure to be invoked.
	* @fallback I am the value to be evaluated if the action fails to complete successfully.
	* @output false
	*/
	public any function execute(
		required any target,
		any fallback
		) {

		try {

			return( run( target ) );

		} catch ( any error ) {

			// If a fallback has been provided, return the fallback instead of letting
			// the error propagate to the calling context.
			if ( structKeyExists( arguments, "fallback" ) ) {

				return( evaluateFallback( fallback ) );

			}

			rethrow;

		}

	}


	/**
	* I marshal the given action inside the Circuit Breaker.
	*
	* @target I am the component receiving the message.
	* @methodName I am the message being sent to the target.
	* @methodArguments I am the message arguments being sent to the target.
	* @fallback I am the value to be evaluated if the action fails to complete successfully.
	* @output false
	*/
	public any function executeMethod(
		required any target,
		required string methodName,
		any methodArguments = [],
		any fallback
		) {

		try {

			return( run( target, methodName, methodArguments ) );

		} catch ( any error ) {

			// If a fallback has been provided, return the fallback instead of letting
			// the error propagate to the calling context.
			if ( structKeyExists( arguments, "fallback" ) ) {

				return( evaluateFallback( fallback ) );

			}

			rethrow;

		}

	}


	/**
	* I determine if the Circuit Breaker is in a closed state.
	*
	* @output false
	*/
	public boolean function isClosed() {

		return( state.isClosed() );

	}


	/**
	* I determine if the Circuit Breaker is in an open state.
	*
	* @output false
	*/
	public boolean function isOpen() {

		return( state.isOpened() );

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I evaluate the given fallback input to produce an output. If the fallback is a
	* function or closure, it will be invoked; otherwise, it will be returned as-is.
	*
	* @fallback I am the fallback producer being evaluated.
	* @output false
	*/
	private any function evaluateFallback( required any fallback ) {

		if ( isCustomFunction( fallback ) || isClosure( fallback ) ) {

			return( fallback() );

		} else {

			return( fallback );

		}

	}


	/**
	* I proxy the execution / invocation of the given action.
	*
	* @target I am the function or component being executed.
	* @methodName I am the message being sent to the target (if it's a component).
	* @methodArguments I am the message arguments being sent to the target (if it's a component).
	* @output false
	*/
	public any function run(
		required any target,
		string methodName,
		any methodArguments
		) {

		// CAUTION: Since the Circuit Breaker is expecting to handle many concurrent
		// requests, all reading-from and writing-to the shared state of the Circuit
		// Breaker is being SYNCHRONIZED with exclusive locking. The state object
		// itself does not perform any inherent locking.

		lock attributeCollection = lockAttributes {

			if ( state.isOpened() ) {

				// If the Circuit Breaker is open, the general idea is to "fail fast."
				// However, if the circuit has been open for some period of time, it
				// might be ready to send a health check request to the target to see
				// if the target has become healthy.
				if ( ! state.canPerformHealthCheck() ) {

					throw(
						type = "CircuitBreakerOpen",
						message = "Target invocation failing fast due to open circuit breaker.",
						detail = "The circuit is open and therefore the requested action could not be executed.",
						extendedInfo = state.getSummary()
					);

				}

			}

			var requestToken = state.trackRequestStart();

		} // END: Lock.

		try {

			// Try to execute the requested action.
			var result = ( isClosure( target ) || isCustomFunction( target ) )
				? target()
				: invoke( target, methodName, methodArguments )
			;

			lock attributeCollection = lockAttributes {

				state.trackRequestSuccess( requestToken );

			} // END: Lock.

			// The target method may not return a defined value, even in a successful
			// invocation. As such, we have to check to see if the result exists before
			// we try to return the result upstream.
			if ( structKeyExists( local, "result" ) ) {

				return( result );

			} else {

				return; // void.

			}

		} catch ( any error ) {

			lock attributeCollection = lockAttributes {

				state.trackRequestFailure( requestToken, error );

			} // END: Lock.

			rethrow;

		} // END: Catch.

	}

}

And, here's my current implementation of the DefaultCircuitBreakerState.cfc:

component
	implements = "ICircuitBreakerState"
	output = false
	hint = "I provide a default implmentation of the Circuit Breaker State."
	{

	/**
	* I initialize the Circuit Breaker State strategy. This state component is meant
	* to help drive the control flow of a Circuit Breaker.
	*
	* @failedRequestThreshold I am the number of requests that can fail before the circuit is opened.
	* @activeRequestThreshold I am the number of parallel requests that can be concurrently active before the circuit is opened.
	* @openStateTimeout I am the time (in milliseconds) that the circuit will remain open until the target is tested for health.
	* @monitor I am the optional state change monitor.
	* @output false
	*/
	public any function init(
		numeric failedRequestThreshold = 10,
		numeric activeRequestThreshold = 10,
		numeric openStateTimeout = ( 60 * 1000 ),
		ICircuitBreakerMonitor monitor
		) {

		// Store the properties.
		variables.failedRequestThreshold = arguments.failedRequestThreshold;
		variables.activeRequestThreshold = arguments.activeRequestThreshold;
		variables.openStateTimeout = arguments.openStateTimeout;
		variables.monitor = structKeyExists( arguments, "monitor" )
			? arguments.monitor
			: ""
		;

		// NOTE: There is no "half-open" state. The half-open pseudo-state will be
		// entered into by a single request in which a full state change isn't necessary.
		states = {
			CLOSED: "CLOSED",
			OPENED: "OPENED"
		};

		// Default to a closed (ie, flowing) state.
		state = states.CLOSED;

		// Initialize the counters.
		activeRequestCount = 0;
		failedRequestCount = 0;

		// Initialize the timers - each of these store UTC millisecond values.
		checkTargetHealthAtTick = 0;
		lastFailedRequestAtTick = 0;

	}


	// ---
	// PUBLIC METHODS.
	// ---


	/**
	* I determine if a health check can be initiated against the target.
	*
	* @output false
	*/
	public boolean function canPerformHealthCheck() {

		return( ! isAtCapacity() && ! isWaitingForTargetToRecover() );

	}


	/**
	* I return a summary of the state of the Circuit Breaker. This can be used for logging
	* and debugging purposes.
	*
	* @output false
	*/
	public string function getSummary() {

		return(
			( isOpened() ? "State: OPENED, " : "State: CLOSED, " ) &
			"Active request count: [#activeRequestCount#], " &
			"Failed request count: [#failedRequestCount#]."
		);

	}


	/**
	* I determine if the Circuit Breaker is currently closed and can accept requests.
	*
	* @output false
	*/
	public boolean function isClosed() {

		return( state != states.OPENED );

	}


	/**
	* I determine if the Circuit Breaker is closed and cannot currently accept any requests.
	*
	* @output false
	*/
	public boolean function isOpened() {

		return( state == states.OPENED );

	}


	/**
	* I reset the Circuit Breaker State, rolling back all counters and timers to a
	* healthy state.
	*
	* @output false
	*/
	public boolean function reset() {

		// Revert to a closed (ie, flowing) state.
		state = states.CLOSED;

		// Reset the counters.
		activeRequestCount = 0;
		failedRequestCount = 0;

		// Reset the timers.
		checkTargetHealthAtTick = 0;
		lastFailedRequestAtTick = 0;

		// Even though we are not sure if the original state (pre-reset) was Opened,
		// let's log this reset as a state-change.
		logClosed();

	}


	/**
	* I track a failed action in the Circuit Breaker.
	*
	* NOTE: The associated error is being passed into the failure method in case any
	* additional logic needs to be implemented based on error type.
	*
	* @requestToken I am the token returned from "start" method.
	* @error I am the error that was thrown during the request execution.
	* @output false
	*/
	public void function trackRequestFailure(
		required any requestToken,
		required any error
		) {

		activeRequestCount--;

		// Check to see if the current failure count is still relevant. Since we are
		// tracking errors in a rolling window, it might be time to reset the count
		// before we track the current failure.
		if ( isClosed() && isNewErrorWindow() ) {

			failedRequestCount = 0;

		}

		failedRequestCount++;
		lastFailedRequestAtTick = getTickCount();

		// NOTE: In this implementation, the requestToken was the start-tick of the
		// Circuit Breaker request.
		logFailure( getTickCount() - requestToken );

		// Check to see if the current failure exceeded the allowable failure rate for
		// the Circuit Breaker. If so, we'll have to trip it open.
		if ( isClosed() && isFailing() ) {

			state = states.OPENED;
			checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );

			logOpened();

		}

	}


	/**
	* I track the start of an action in the Circuit Breaker. Every "start" should be
	* followed by either a completion in "success" or in "failure".
	*
	* This method is expected to return some sort of value. It doesn't matter what type
	* of value (numeric, struct, etc.); but, that value will be passed back-in using
	* either the trackRequestSuccess() or trachRequestFailure() methods.
	*
	* @output false
	*/
	public any function trackRequestStart() {

		// If a request is being initiated while the circuit is tripped open, it must be
		// a health check. Since the ability to accept a health check is, in part, driven
		// by the open-state timeout, in order to prevent parallel requests from also
		// initiating a health check request, let's bump out the timer. This will also
		// implicitly "reset" the timeout, for all intents and purposes, if the health
		// check fails.
		if ( isOpened() ) {

			checkTargetHealthAtTick = ( getTickCount() + openStateTimeout );

		}

		activeRequestCount++;

		// If the current request just exhausted the request pool, open the circuit so
		// no more requests can be initiated.
		if ( isClosed() && isAtCapacity() ) {

			state = states.OPENED;

			// NOTE: Since this "trip" is based on capacity and not on error rate, there
			// is no need to adjust the health-timer. We want the circuit to re-close as
			// pending requests complete.

			logOpened();

		}

		// Each request needs to be associated with some type of token (that will be
		// passed back into the success / failure tracking methods). In this case,
		// we're just going to use the current UTC milliseconds so that we can roughly
		// correlate each request with a duration.
		return( getTickCount() );

	}


	/**
	* I track a successful action in the Circuit Breaker.
	*
	* @requestToken I am the token returned from "start" method.
	* @output false
	*/
	public void function trackRequestSuccess( required any requestToken ) {

		activeRequestCount--;

		// NOTE: In this implementation, the requestToken was the start-tick of the
		// Circuit Breaker request.
		logSuccess( getTickCount() - requestToken );

		// Any successful request that returns while the Circuit Breaker is open will
		// move the circuit back into a closed, flowing state. This may be the "health
		// check" request; or, it may be a previously long-running request that finally
		// returned some time after the circuit was tripped open; or, it may be an
		// "at capacity" request that has completed, releasing a slot in the request
		// pool. At this point, there is no differentiating between the various types
		// of successful returns.
		if ( isOpened() && ! isAtCapacity() ) {

			state = states.CLOSED;

			// Reset failure tracking.
			failedRequestCount = 0;
			lastFailedRequestAtTick = 0;
			checkTargetHealthAtTick = 0;

			logClosed();

		}

	}


	// ---
	// PRIVATE METHODS.
	// ---


	/**
	* I determine if the Circuit Breaker has exhausted its request pool and should no
	* longer accept any requests until pending requests have completed.
	*
	* @output false
	*/
	private boolean function isAtCapacity() {

		return( activeRequestCount >= activeRequestThreshold );

	}


	/**
	* I determine if the Circuit Breaker is failing based on the failed request threshold.
	*
	* @output false
	*/
	private boolean function isFailing() {

		return( failedRequestCount >= failedRequestThreshold );

	}


	/**
	* I determine if a new error-tracking window should be initiated. Errors are tracked
	* in a rolling window so that infrequent errors don't eventually trip the Circuit
	* Breaker unnecessarily.
	*
	* @output false
	*/
	private boolean function isNewErrorWindow() {

		return( lastFailedRequestAtTick < ( getTickCount() - openStateTimeout ) );

	}


	/**
	* I determine if the OPEN Circuit Breaker is currently waiting before attempting to
	* check the health of the target (ie, whether or not it is yet appropriate to check
	* the health of the target).
	*
	* @output false
	*/
	private boolean function isWaitingForTargetToRecover() {

		return( checkTargetHealthAtTick > getTickCount() );

	}


	/**
	* I log the changing of the state from Opened to Closed.
	*
	* @output false
	*/
	private void function logClosed() {

		if ( ! isSimpleValue( monitor ) ) {

			monitor.logClosed();

		}

	}


	/**
	* I log a request failure.
	*
	* @durationInMilliseconds I am the duration of the failed request.
	* @output false
	*/
	private void function logFailure( required numeric durationInMilliseconds ) {

		if ( ! isSimpleValue( monitor ) ) {

			monitor.logFailure( durationInMilliseconds );

		}

	}


	/**
	* I log the changing of the state from Closed to Opened.
	*
	* @output false
	*/
	private void function logOpened() {

		if ( ! isSimpleValue( monitor ) ) {

			monitor.logOpened();

		}

	}


	/**
	* I log a request success.
	*
	* @durationInMilliseconds I am the duration of the successful request.
	* @output false
	*/
	private void function logSuccess( required numeric durationInMilliseconds ) {

		if ( ! isSimpleValue( monitor ) ) {

			monitor.logSuccess( durationInMilliseconds );

		}

	}

}

The monitor itself is not worth sharing - it basically just writes the log messages to an in-memory array and then makes that array available for debugging.

Circuit Breakers are really interesting to think about. It's been a fun challenge, trying to figure out how to break the functionality apart, making things generic while, at the same time, keeping the concept effective. I don't think I have the "right" solution just yet. But, I think I'm moving in the right direction.

Want to use code from this post? Check out the license.

Reader Comments

I believe in love. I believe in compassion. I believe in human rights. I believe that we can afford to give more of these gifts to the world around us because it costs us nothing to be decent and kind and understanding. And, I want you to know that when you land on this site, you are accepted for who you are, no matter how you identify, what truths you live, or whatever kind of goofy shit makes you feel alive! Rock on with your bad self!
Ben Nadel