Skip to main content
Ben Nadel
On User Experience (UX) Design, JavaScript, ColdFusion, Node.js, Life, and Love.

My Personal Best Practices For Using LaunchDarkly Feature Flags

By Ben Nadel on

It's hard to believe that I've been using LaunchDarkly for over four years now. In October 2015, when Christopher Andersson suggested that we try something called "feature flags" here at InVision, I couldn't even begin to understand how truly revolutionary they were going to be. It took us a little while to figure out how feature flags worked; and, it took us a little longer to figure out how they can best be leveraged. But, at this point, using LaunchDarkly for feature flags is akin to using Git for source control - I just can't imagine doing it any other way.

Ben Nadel rocking out a LaunchDarkly tee-shirt, like a boss.

Prompted by a conversation that Adam Tuttle and I were having the other day in my post about ColdFusion application performance tuning, I thought it would be fun to share my personal best practices for using feature flags in our Lucee CFML and Angular application. These best practices work for me and the type of work that my team does - your mileage may vary.

Take the Time to Understand How LaunchDarkly Targeting Works

One of the biggest hurdles for me when adopting LaunchDarkly was understanding how user identification and targeting worked. I kept treating LaunchDarkly like a "database of users" that I could query with arbitrary filters (targeting). But LaunchDarkly doesn't aggregate user data, it aggregates rule data that pertains to users.

The best metaphor that I can come up with is that LaunchDarkly targeting works like a "pure function": you pass user data into the function, the LaunchDarkly Client runs that data through its locally-cached rule-set and then, near instantly, it spits out a result. The more complicated your rules get, the more Inputs you need to provide when calling the "pure function".

For whatever reason, it took me a really long time to feel comfortable with this notion. But, once I finally had the right mental model, it made all the difference.

ASIDE: This "pure function" implementation is part of the reason that LaunchDarkly is so insanely fast! It doesn't block the request to make any HTTP calls in order to check on feature flag status - it just takes your inputs and tests them against its rules engine. And, to make matters even better, it uses streaming technology to keep the rules engine up-to-date in the background. This is why changes in the LaunchDarkly dashboard show up instantly in your LaunchDarkly Client.

When in Doubt, Put It Behind a Feature Flag

I don't use feature flags for everything. But, I do use feature flags for just about everything. They're light-weight, they're easy to add, and they're evaluated instantly (see "ASIDE" above). There's basically no down side to using them. And, for the reasonable cost of having to clean them up later, you get to deploy code with an intoxicating amount of confidence - confidence that you're not about to break production.

At first, this may feel like a point of friction. But, trust me - just start doing it. After a few feature deployments, you won't even think about it - it becomes second nature.

BOLD STATEMENT: Feature flags give me more confidence than Testing does. With a Test, all I can do is ensure that all "tested" facets work correctly. With a Feature Flag, however, I have the power to instantly deactivate any code that is throwing an error or behaving incorrectly, regardless of whether or not it was tested. That's true power!

Create Separate Tickets For the Three Phases of Feature Flags

When I am working on a task that is going to be put behind a feature flag (which is the majority of tasks), I try to create three separate tickets in JIRA:

  1. A ticket for writing and deploying the code.
  2. A ticket for rolling-out and evaluating the code.
  3. A ticket for removing the feature flag.

The real benefit that feature flags offer is the ability to decouple the "deployment" of the code from the "roll-out", or activation, of the code. In accordance with this decoupling, I like to create separate JIRA tickets that mirror these different phases of effort.

This separation allows me to deploy the code and close a JIRA ticket one day; and then, start rolling-out the feature flag, evaluating performance, and taking notes in a separate JIRA ticket the next day. For me, JIRA tickets relate more directly to "effort" than to specific features.

NOTE: I usually wait a week-or-so to work on the "remove" ticket. This gives the feature flag and the experimental code a week to "soak" in production, providing plenty of time for unexpected edge-cases to make themselves known.

Differentiate Short-Term and Long-Term Feature Flags

While the vast majority of my feature flags are short-lived (between several hours and a week), some feature flags are meant to be used long-term. I refer to these toggles as "operational" feature flags. And, I like to prefix them with a special notation:

OPERATIONS--

As an example:

  • OPERATIONS--debug-logging
  • OPERATIONS--fail-health-check

This helps both when I'm reading the code as well as when I'm scrolling through the LaunchDarkly dashboard. Whenever I see the OPERATIONS-- prefix, I know that the gated-code isn't an "experiment" - it's code that can be enabled and disabled in order to affect the runtime state of the system.

Some example of operational feature flags include:

  • Enabling / disabling "maintenance mode".
  • Enabling / disabling rate-limiting.
  • Enabling / disabling system metrics gathering.
  • Enabling / disabling system reporting.
  • Enabling / disabling "debug" logging.

For short-term feature flags, I liked to prefix them with an "intent" indicator and a JIRA ticket number. I only have two "intents":

  • Product - I'm changing the feature-set of the product in some way (a change seen by the user).

  • Performance - I'm trying to improve the performance or stability of the product, but I'm leaving the feature-set unchanged (a change that is transparent to the user).

So, for example, if I was running an experiment to try and improve the performance of the project-list page's load-time as part of the Rainbow (RAIN) team, I might have a feature flag named:

performance-RAIN-1337-project-list-load-time

Now, just from seeing this feature flag in the code, I immediately know:

  • What kind of feature flag it is (short-term vs. long-term).
  • What the flavor of the feature flag is (feature vs. performance).
  • What area the feature flag relates to (project list load times).
  • Which team is responsible for it (Rainbow - the best team!).
  • Where to find more information (JIRA ticket RAIN-1337).

Wrap or Alias Feature Flags With Semantic References

As you can see above, my feature flag names tend to be a bit verbose because they encode a lot of information about what they are doing, how long they live, and which team owns them. As such, I like to either wrap or alias them in my code in order to make the code easier to read.

On the server-side, this usually means putting the feature flag reference behind a "should" method. As in:

shouldUseProjectLoadTimeExperiment( userID )

Granted this method name is also long - I like long, descriptive names; but, it ultimately encapsulates something that is even longer and more noisy (pseudo-code):

<cfscript>

	public void function someControllerMethod( rc ) {

		if ( shouldUseProjectLoadTimeExperiment( rc.user.id ) ) {

			// .... experimental code.
			return;

		}

		// .... normal code.

	}

	// ---
	// PRIVATE METHODS.
	// ---

	private boolean function shouldUseProjectLoadTimeExperiment( userID ) {

		return( featureFlagService.getFeatureByUserID( userID, "performance-RAIN-1337-project-list-load-time" ) );

	}

</cfscript>

For me, this approach cleanly separates the question, "should", from the implementation, "feature flag". I find this code easier to read and to reason about. It also allows me to bake-in more logic to the "should" method if it depends on more than just the feature flag.

On the client-side, I use a similar approach. But, rather than putting this information behind a method call, I usually just put it behind a property in the View-Model. For example, in AngularJS (pseudo-code):

app.controller(
	"MyController",
	function( $scope, config ) {

		// ....
		$scope.shouldShowHotNewFeature = !! config.featureFlags[ "product-RAIN-007-hot-new-feature" ];
		// ....

	}
);

NOTE: In the above code I'm using the double-bang / double-not (!!) pseudo-operator to coerce the feature flag response to a Boolean value. This allows me to work with the correct date-type even if the feature flag is undefined - something that can happen during a rolling deployment.

Now, in my HTML view, I use that View-Model property which asks the question, "should", without having to understand the implementation details, "feature flag":

<ul class="nav">
	<li>
		<a href="#/home">Home</a>
	</li>
	<li>
		<a href="#/about">About</a>
	</li>
	<li>
		<a href="#/contact">Contact</a>
	</li>
	<li ng-if="shouldShowHotNewFeature">
		<a href="#/hawtness">Hawtness</a>
	</li>
</ul>

Minimize the Life-Time of a Feature Flag

Feature flags, like all aspects of code, rot over time. In fact, feature flags can create even more toxic rot than your traditional code because they add an element of dynamic, uncertainty in a sea of static information. I can't tell you how many times I've come across a feature flag in our code only to find out that the given experiment had either been completely-abandoned or fully-rolled-out years ago.

Years ... ago!

When this happens, it's unclear to every subsequent developer as to why the code is there; and, whether or not it is safe to remove. This makes it much harder to refactor code as you never know whether or not certain control-flows ever get used.

I recommend that you use lots of feature flags. But, delete them as soon as possible! I try to roll feature flags out within minutes of deployment (see notes below). And then, delete them about a week after that (see JIRA ticket notes above).

Don't Let a Product Manager Roll-Out Feature Flags

Adding feature flags and then deferring the roll-out of said feature flags to a Product Manager (PM) is an almost sure-fire way of causing code-rot. The developers move onto other work and the PMs rarely communicate when the feature flags have been rolled-out. PMs also tend to be overly cautious and want to keep feature flags around for a long time "just in case". This leaves unnecessary complexity and uncertainty in the code.

Furthermore, you - as the developer - are the one who understands how the feature flag will affect performance. As such, you are the one who knows which logs to look at and which metrics to observe as the feature flag is being rolled-out (see notes below). Only you can ensure that the feature flag doesn't have any negative, unexpected impact on the system or the user experience (UX).

Your PM doesn't know this stuff (and shouldn't be expected to). When you hand the responsibility of roll-out to your PM or to another department, you are doing a disservice to your users!

Roll Feature Flags Out as Quickly as Possible

While I love the fact that feature flags decouple deployment from activation, I like to roll my feature flags out as quickly as possible after they've been deployed. Typically, this means that I'm at 100% activation within minutes of deployment.

For cases that entail a little more uncertainty, my roll-out time is a little bit longer; but, it's almost never more than 24-hours. In such cases, I tend to use the following graduated timeline:

  • Activate for just me, test code in production.
  • Activate for 2% of users for 10-minutes.
  • Activate for 10% of users for 10-minutes.
  • Activate for 25% of users for 10-minutes.
  • Activate for 50% of users for several hours.
  • Activate for 100% of users.

A timeline like this usually accompanies some degree of uncertainty about the changes that I made (ie, are there any edge-cases that I missed in the logic). As this timeline progresses, I keep a trained eye on the logs, the APM dashboards, and any StatsD metrics that I have added to the code.

The goal of the slower roll-out is to limit the extent of any potential fallout, giving me time to disable the feature flag at the first sign of error. But, if no problems show up in an hour or so, the reality is such that things are probably fine. And, even at 100% roll-out, I still have the ability to disable the feature flag at a moment's notice if an issue does present itself.

ASIDE: This is why I tend to leave a feature flag in my code for about a week before deleting it even if it was rolled-out to 100% of users just after deployment. It always give me that "safety valve" on the relative long-tail of user interaction.

I would go so far as to say that if you are taking multiple days or even multiple weeks to roll-out a feature flag, then your original thesis was poorly formulated. If it takes you that long to verify weather or not your changes were good or bad, then you're just shooting in the dark.

Now, sometimes, shooting in the dark is what needs to be done. But, it should happen extremely rarely; and it should leave you with a bad taste in your mouth.

Always Monitor Logs and Graphs During Feature Flag Activation

Whenever I roll-out a new feature flag, whether it be related to performance improvements or a hawt new feature, I always have logs and graphs that I can monitor. Often times, these are the existing logs and graphs that automatically pick-up the new traffic patterns; and sometimes, these are the new graphs for metrics that I have explicitly added as part of the experiment.

Regardless of what I deployed, I am always looking to see if:

  • The feature is throwing errors (server-side and client-side).
  • The feature is receiving traffic.
  • The feature is running with good performance.
  • The feature is putting unexpected load on the database.

There's no degree of certainty, unit testing, or integration testing that obviates the need to check logs and graphs!

Consider the Effect of Rolling-Deployments

When developing locally, it's easy to think about the Client code and the Server code as being in lock-step state. However, the moment you deploy code to production - especially in a horizontally-scaled application - your Client code and your Server code will be out of sync. You can end up with:

  • Old client code calling old server code.
  • Old client code calling new server code.
  • New client code calling old server code.
  • New client code calling new server code.

When rolling out new - and especially when deleting old - feature flags, understanding the implications of these relative states is important. There's no approach that I know of that easily handles all of these edge-cases. Each context is different. The important point is just to stop and think about the potential state of the code during the Server and Client life-cycle.

Your best best is to always code feature-flags defensively and to always make changes backwards compatible. But, that's just general "best practices" in software development.

Try Pushing as Much Feature Flag Behavior Up into the Server

I use feature flags in both the Server-side code and the Client-side code. But, as much as possible, I try to push the feature flag behavior onto the Server. This gives me much more control over how the feature flags are consumed.

When I deploy code, I know that the Server-side state is predictable and deterministic. However, deploying new code doesn't mean that the Client-side code will get updated in a timely manner. This is especially true in a SPA (Single-Page Application) context, where a user may go weeks without refreshing the page (think GMail). By pushing the feature flag interactions into the Server, the user's behavior becomes much less of a constraint.

NOTE: LaunchDarkly has a Client-side SDK; but, I've never been motivated enough to try it out. As I discuss below, JavaScript applications are already very complex, at least for my brain. As such, I am not too enthused at the idea of throwing dynamic state into the mix. But, again, your mileage may vary - this may be a limitation of my brain capacity.

Keep The Client-Side Code as Simple as Possible

In addition to moving as much of the feature flag logic as I can into the Server, I also try to keep the Client-side code as simple as possible. Generally, I limit the responsibilities of the Client-side feature flags to the following:

  • Hide a new feature's ingress (ie, the feature's call-to-action).
  • Change an AJAX end-point that is being consumed.
  • Tweak a CPU or network intensive task.

Today's web applications are already very complex. I have no desire to add a bunch of additional complexity just because I want to leverage feature flags. And, if I push a sufficient amount of feature flag logic into the Server, I end up being able to keep my Client-side code relatively simple.

Never Ever Share Feature Flags Across Team / Deployment Boundaries

Feature flags are an implementation detail for your team's services. If you allow another team to consume your feature flags, you are leaking implementation details. This is toxic for maintenance and flies in the face of all software development best practices!

If another team wants to use your feature flags, direct them to, instead, create their own feature flag that entails similar intent and can be deployed and rolled-out independently of your service. If the other team insists that the feature flags need to be kept in lock-step, assure them that they are absolutely wrong! They have a misunderstanding of how the world works - teach them about Eventual Consistency.

Isolate Feature Flags by "Project" in LaunchDarkly

Unfortunately, when we first started using LaunchDarkly, we didn't really understand how the LaunchDarkly dashboard worked. So, every one of our teams added their feature flags to the same "Project" - the "default" project. Now, that default project has over 600 feature flags in it that cross dozens of Team and Service boundaries. This makes it very difficult to separate the signal from the noise and determine where the most "rot" is coming from.

If I could go back and do it all over again, I would create a new LaunchDarkly "Project" for every Service / deployment boundary. This would make maintaining the feature flags easier. And, it would help enforce the idea that each feature flag should be owned and encapsulated within a single Team (see notes above).

Keep Feature Flag State Consistent Across All Environments

Under each "Project" in LaunchDarkly, you can create multiple "Environments". For example, you might create the following environments:

  • Local
  • Staging
  • Production

These environments allow you to roll-out feature flags independently and with different targeting rules. This is a powerful feature; but, it can also be the source of much confusion. At work, we've had several "incidents" opened because a developer didn't realize that a change in behavior was due to a feature flag - especially when the same behavior could not be replicated in other environments.

It's important that behavior seen in one environment is mostly indicative of the behavior that is seen in all environments. This won't be true all the time; but, if a feature is enabled in the Production environment, you better make sure it's also enabled in all Local and Staging environments as well. Otherwise, your teammates won't know the true behavior of the application until their code hits Production.

Don't Nest Feature Flags In Your Code

Feature flags already add some degree of cognitive load to the application. Attempting to nest feature flags takes that overhead and potential confusion and ramps it up in severity. I would recommend avoiding it when possible.

If I ever have an urge to nest feature flags, I stop and try to do one of the following:

  • Commit to the current "experiment" by removing the old branch of code and making the gated code the "one true path" of execution. Then, once that is deployed, I go back into the code and add the new feature flag at the lower level.

  • Deactivate the current "experiment" so that no users are accessing it. Then, update the code at the lower level to include what would have been the new experiment. At this point, the new experiment can be subsumed by the old experiment; and both modifications to the code can be safely re-rolled-out under the existing feature flag.

Ideally, you want to keep your experiments small and short-lived. Attempting to nest feature flags ends up keeping complexity around longer than it needs to be. It also makes it unclear what exactly is being "tested" at any given time.

Don't Pass-In Feature Flag State As Component Props / Attributes

One of the most frustrating mistakes that I see developers making (in code that I then have to maintain) is passing-in feature flag state as attributes or "props" on a Client-size Component. To see what I mean, imagine coming across an Avatar component that is experimenting with lazy-loading the underlying src attribute. Passing in feature flag state as a prop might looks like this (pseudo-code):

render() {

	return(
		<Badge>
			<Avatar
				imageUrl={ this.user.imageUrl }
				useLazyLoading={ this.featureFlags[ "performance-RAIN-123-lazy-loading-images" ] }
			/>
			{ this.user.name }
		</Badge>
	);

}

Here, the feature flag state is being passed into the Avatar component using the useLazyLoading prop. The fundamental flaw here is that the implementation details of the Avatar component are now leaking out into the calling context. The lazy-loading of the underlying image is an implementation detail of the Component - it isn't part of the component's API.

By passing the feature flag in as a prop, you've now coupled the calling context to the experimental implementation. Not only does this make both the Avatar and its calling code harder to change, you're probably also on your way to living in a "prop passing" nightmare.

Instead, feature flag state should be injected into the definition of the component so that individual instances of the component can be created without the calling context having to know anything about the experiment (pseudo-code):

class Avatar {

	private useLazyLoading: boolean;

	constructor( featureFlags: FeatureFlags ) {

		this.useLazyLoading = !! featureFlags[ "performance-RAIN-123-lazy-loading-images" ];

	}

	// ....

}

With this approach to inversion of control (IoC), the lazy-loading now becomes an implementation detail of the Avatar; and, the rest of the application can consume the Avatar without having to know that such experiments are taking place. This makes the code much easier to consume and maintain over time.

Avoid The Redis Caching Feature Of LaunchDarkly Until You Understand It

When we first implemented the LaunchDarkly SDK (Software Developer Kit), we were enamored with the idea of the Redis cache. We thought it would save API calls or help keep the application in a more consistent state. To be honest, we didn't really know what it did. And, to be extra honest, I still don't really know what it does, even today.

All I know is that when we used the Redis cache, there was a 5-minute delay in between when we would interact with the LaunchDarkly dashboard and when those changes would propagate to our Application. In the end, this had to do with how we configured our Redis integration. But, after we finally removed the Redis cache, LaunchDarkly changes started showing up instantly within our application.

The LaunchDarkly SDK is build really well - don't try to over-optimize it until you actually have an issue. Events stream in the background; rules are cached in-memory; LaunchDarkly is not going to be your bottleneck. Embrace the awesomeness!

Don't Include Feature Flag State With Tracking Analytics

When we first started adding feature flags to our application, we thought it would be really cool and "data sciencey" to include a user's feature flag state with all of the events that were logged during the user's interactions. The hope was, this way, we could later correlate various funnel observations and behaviors with gated experiments.

The problem with this is two-fold:

  • First, we quickly went from adding our first feature flag to having dozens of feature flags. And, passing all of these key-value pairs along with each event payload started to break the analytics APIs (which had max-property limitations).

  • Second, any non-Operational feature flag should be relatively short-lived, getting rolled-out and then deleted quite soon after being deployed. As such, there's no real sense of the longitudinal nature of a feature flag.

To be clear, I am not a data scientist; and, I don't really understand how data science even works. So, maybe there is a sensible way to correlate feature flags with analytics, such as creating a new Event Type for a particular experiment. Just, whatever you do, remember to remain vigilant about deleting feature flags as soon as you can.

Consider What Your GitHub "diffs" Will Look Like

This might be a strange "best practice"; but, when I start working with feature flags in my code, I take time to consider what the diff will look like in the GitHub Pull Request that I have to send to one of my teammates. This is true for both the adding of feature flags and the eventual deleting of those feature flags.

ASIDE: Did I just invent GDD: Git-Diff Driven Development? I might be on to something here!

Thinking about the readability of the GitHub diff encourages me to do several important things:

  • Create code that is more isolated.
  • Create code that is easier to delete.
  • Create code that is more flexible.
  • Create code that favors clarity over normalization / de-duplication.

The key to clarity here is finding the right level of the call-stack at which to add the feature flag gating. There's no "rule" on how to do this - I just try to think about which level of the call-stack will lead to an easier-to-understand Pull-Request (PR).

The lowest-level in the call-stack is good for single branch around a single call to something like a SQL query optimization (pseudo-code):

component {

	public struct function doSomething() {

		// ....

		if ( shouldUseSqlExperiment( userID ) ) {

			dataAccess.getDataWithOptimization( userID );

		} else {

			dataAccess.getData( userID );

		}

		// ....

	}

	// ---
	// PRIVATE METHODS.
	// ---

	private boolean function shouldUseSqlExperiment( userID ) {

		return( featureFlags.getFeatureByUserID( userID, "performance-RAIN-123-sql-optimization" ) );

	}
}

With an approach like this, the GitHub diff will end up being very clean with:

  • Just a single conditional statement.
  • A completely new data access method - getDataWithOptimization().
  • A completely new "should" method - shouldUseSqlExperiment().

And, eventually, when I commit to the experiment, making it the one true path, the GitHub diff will be equally clean:

  • Delete the conditional statement.
  • Delete the old data access method - getData().
  • Delete the "should" method - shouldUseSqlExperiment().

If the feature flag starts to affect more than just a single conditional statement within the parent context, I start thinking about whether or not the GitHub diff will be more clear if I elevate the toggle to the method level. Depending on how the control-flow has been architected, I might accomplish this with a duplicate method and a guard statement:

component {

	public struct function doSomething() {

		if ( shouldUseMoreRobustExperiment( userID ) ) {

			return( doSomethingWithOptimization() );

		}

		// ....

	}

	public struct function doSomethingWithOptimization() {

		// ....

	}

	// ---
	// PRIVATE METHODS.
	// ---

	private boolean function shouldUseMoreRobustExperiment( userID ) {

		return( featureFlags.getFeatureByUserID( userID, "performance-RAIN-123-more-robust-optimization" ) );

	}
}

Here, the doSomething() method is being short-circuited for the experiment, internally redirecting control-flow over to the experimental method. This is great because the calling context - that which consumes Service.doSomething() - doesn't need to know about the experiment, even though the entire implementation of .doSomething() is being changed.

This approach likely requires that I duplicate some of the logic from the .doSomething() method. But, this is a small price to pay for a GitHub diff that is very clean:

  • Just a single guard statement.
  • A completely new data access method - doSomethingWithOptimization().
  • A completely new "should" method - shouldUseMoreRobustExperiment().

And, eventually, when I commit to the experiment, making it the one true path, the GitHub diff will be equally clean:

  • Delete the old method - doSomething().
  • Delete the "should" method - shouldUseMoreRobustExperiment().
  • Rename the new method, doSomethingWithOptimization(), to be doSomething().

Note that in the above approach, I stated that I would likely have to duplicate some of the existing logic in order to author the experimental pathway. This is a common theme: the higher-up in the call-stack that I add a feature flag, the more likely I am to have to duplicate code in order to keep the eventual GitHub diff clean.

This is OK! I am sure that it has been drilled into you that code duplication is a supreme evil; and, for a long time, I felt the same way. But, as I have gotten more experience under my belt, I've come to understand that, for years, I've misunderstood the meaning of DRY (Don't Repeat Yourself). And that, sometimes, duplicating code is not only OK, it's the right way to construct your application.

In the case of feature flags, we are duplicating code in an effort to make the code easier to understand and, eventually, easier to delete. Remember, the goal of every non-Operational feature flag is prove (or disprove) a hypothesis; and then, to very quickly be deleted.

Now, if I start to add a feature flag to a Service, and it starts to require a lot of helper methods or some fairly in-depth logic, I will likely elevate the feature flag up the call-stack even further. Depending on who the application is architected, I might put the feature flag right in the Controller, diverting control flow to a new Service (and potentially a new set of Data Access methods):

component {

	public void function api_end_point( requestContext ) {

		if ( shouldUseMoreRobustExperiment( userID ) ) {

			requestContext.data = optimizedService.doSomething();
			return;

		}

		requestContext.data = service.doSomething();

	}

	// ---
	// PRIVATE METHODS.
	// ---

	private boolean function shouldUseMoreRobustExperiment( userID ) {

		return( featureFlags.getFeatureByUserID( userID, "performance-RAIN-123-more-robust-optimization" ) );

	}
}

At this point, our GitHub diff is going to be relatively large (when compared to earlier refactoring). But, it will still be very clean - though almost certainly with more code duplication:

  • A single guard statement.
  • A new "should" method - shouldUseMoreRobustExperiment().
  • A new service - OptimizedService.

The OptimizedSerivce will likely have a lot of logic that looks reminiscent of the logic in the old Service. But, there's also a good chance that much of that logic is being tweaked for the experiment. So, depending on how hard you squint, it isn't even really "duplication".

There's also a good chance that the new OptimizedService will have an entirely new set of data-access method. But again, they will be completely isolated in the GitHub diff, making them clear and easy to understand.

And, when I eventually commit to this experiment, the GitHub diff to delete the old code is equally clean:

  • Delete the guard statement.
  • Delete the "should" method - shouldUseMoreRobustExperiment().
  • Delete the old service - Service.

In every one of these approaches, both the Pull-Request to add the experiment and the Pull-Request to commit to the experiment are clean and easy to understand because they encompass cohesive thoughts. They favor clarity over everything else. Which makes them easier to create, easier for your teammates to approve, and eventually, easier for you to delete.

I Use Feature Flags; I Also Eat and I Sleep and I Breathe

After 4-years of using LaunchDarkly and feature flags in my ColdFusion, Lucee CFML, Angular, and React applications, I no longer think about feature flags as something that I "add" to the code. For me, feature flags have just become the way code is written. It's how I do right by my users; and it's how I feel confident in the code that I deploy to production.

The above "best practices" are the ones that have worked out well for me personally; I hope that some of this pontification may help you as well in your web application development journey. And regardless of whether or not you agree with any of my points, I strongly believe that embracing feature flags in general will be one of the most powerful changes that you can make to your software development practices.



Reader Comments

Wow Ben, this is way more than I ever expected in response! Sorry it took me so long to get through it all; but in my defense it's pretty huge!

LaunchDarkly sounds pretty great, and I'm going to spend a bunch of time thinking about how it could benefit my team and our products. I'm sure it could, but experiencing cases where we could have used it to our advantage will surely help crystalize the ideas.

Something I don't think you touched on was any of the technical implementation details of how it is made available inside your application. Your code examples look mostly like FW/1 controllers to me, which is super familiar because that's what we use every day... And you've got a service that you're calling to get the feature flag determination based on user id. What happens in that service? An HTTP request to another app running on the local server? Something that checks system environment variables (which are updated in a streaming fashion by a LaunchDarkly background process)? Something else?

Also, you said that the feature flag determinations are pure functions, which means that they are deterministic: Every time you input userId 42 and flag name XYZ you're going to get the same response. That makes enough sense when you think about only exposing the feature to yourself, but how does it make the decision in a deterministic way if the rules say that 20% of users should get true?

Maybe it's because I'm thinking about this through the lens of A/B testing, which is a concept I already mostly understand, but that is more random-chance based, which is not deterministic (right?)...

Secondly RE: deterministic functions; I assume that the pureness of them resets if you update the rules, right? If the initial rule is me-only, and then you release it to 10% of users, their expected return value is changing. Not that this really has any technical consequences, I'm mostly asking just out of curiosity.

Anyways, thanks for a great write-up!

Reply to this Comment

@Adam,

Ha ha, no worries on timing. As you have discovered about me, communication is not my strong suit.

Re: technical implementation, yes we use FW/1 at work. And we wrap the LaunchDarkly Java Client (from their Java SDK) in a ColdFusion component which we can then inject into the various other services + controllers.

The Java SDK works by doing event-streaming in the background. So, essentially, the SDK keeps all of the rules in memory; then, when you change a rule in the Dashboard, those rules immediately synchronize with the Java Client in the background and your application just automatically starts seeing the the changes the next time you ask for a user's feature flag "variations".

You are correct in that the "deterministic" nature of the rule-set changes when you change the rules. However, when you talk about a percentage-based roll-out, as in setting a value to true for 10%, then 20%, etc -- in that case, the users are consistently applied to a particular bucket.

Under the hood, the LaunchDarkly client is generating some sort of Checksum based on the key used to identify the user. The Checksum is just a way to consistently translate a given key into an int value. Something like:

checksum( user.id ) => 0...100

So, when you have a roll-out that you are gradually increasing, it's this checksum that determines what feature flag values a user sees. Imagine setting a roll-out percentage to 30%, then the rule evaluation is basically this:

isFlagOn = ( checksum( user.id ) <= 30 )

Since the checksum() result is always the same for the given input (user.id), then a user will consistently be included in the A/B cohort as the percentage roll-out increase.

^^ A lot of this is my assumption of how it is working based on what I've read about feature flags. Hopefully I haven't muddled things too much.

Reply to this Comment

@Ben,

Thanks for the clarification. Your last point was one I was ready to make myself. It seems like the % based cohorts isn't necessarily 10% of active users but rather 10% of potential users. I'm dubious about the whole % thing in general since, for example, if the 10% of users that get put into the experimental cohort aren't active during the test, then the experiment isn't being tested.

I'm sure that's an uncommon edge case, and I'd bet that LaunchDarkly shows enough details to see that nobody is getting the experiment, at which point you could increase the % to at least get some traffic through it.

I guess what I'm trying to say is that I'm dubious about the math/details but ultimately it doesn't matter much. (Which is, I think, indicative of my personality a lot of the time. :))

Reply to this Comment

@Adam,

Yeah, the percentage stuff is tricky. I just "Trust" it; but, I also usually have some sort of metric being recorded. That said, in the LaunchDarkly dashboard, they have "Insights", where it basically shows you a time-graph of all the True / False evaluations:

https://launchdarkly.com/blog/launched-flag-insights/

So, at the very least, it's not a "black box".

Reply to this Comment

Post A Comment

You — Get Out Of My Dreams, Get Into My Blog
Live in the Now
Oops!
NEW: Some basic markdown formatting is now supported: bold, italic, blockquotes, lists, fenced code-blocks. Read more about markdown syntax »
Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please do not post unrelated questions or large chunks of code. And, above all, please be nice to each other - we're trying to have a good conversation here.