Yesterday, I had the pleasure of representing InVision App in a Lunch-n-Learn panel discussion for LaunchDarkly (alongside Lena Krug of Meetup, Christopher Cosentino of Shutterstock, and Greg Ratner of Troops). As you may have seen from my blog, InVision has been a LaunchDarkly customer for almost 3-years; and, I'm happy to say that it has completely changed the way that our team thinks about and approaches the deployment and the release of features in our application. Since the lunch-n-learn was a cozy get-together of prospective customers, I thought it would be worthwhile to write-down some of the thoughts that I shared with the group.
The following is a best-effort summary of my recollection. We were given some of the questions ahead of time such that we had time to reflect on our experiences. Conversations, however, always end-up going off-book; so, this is all that I could remember.
How often does your team deploy software?
Across the entire organization, InVision releases well over 100-times a week. Our deployments are spread across a large number of application queues. Each individual application queue is capable of being deployed 7-8 times a day (within business hours, more if you have no work-life balance).
When a deployment goes out, it follows a tiered workflow. First, it goes to a Preview environment where it "bakes" for an hour while it is being tested. Then, it is auto-promoted to a Multi-Tenant environment where it bakes for another hour. And, finally, it is auto-promoted to a large set of Single-Tenant Enterprise environments where it bakes for one more hour.
Currently, the bake time in each tier blocks the next deployment. Meaning, once a deployment goes to the first tier - Preview - no other teams can start a new deployment (for that application queue) until the one-hour bake time has completed and the deployment is auto-promoted to the next tier.
In the near future, we're going to move to a more "stacked deployment" model where teams can literally deploy one-after-another in the same application queue. And this is why feature flagging with LaunchDarkly is so critical! If your deployment hits production and you find a bug that you need to roll-back, you don't just roll-back your code - you roll-back all of the code in the deployments that started going out after you. As such, a strict separation of "deployment" vs. "roll-out" is a huge productivity boost! It is far more efficient (and kind) to turn-off a buggy feature than it is to roll it back.
When did you first encounter Feature Flags?
I had heard of "feature flags" in the past, but LaunchDarkly was my first hands-on experience with feature flags in an production application. Prior to LaunchDarkly, if we needed to "dark launch" a feature, we would literally hard-code user-identifiers in our application logic. And, changes to this logic would, of course, have to be deployed, which meant it would - in a best case scenario - take several hours to be rolled-out to every production tier.
What do you use feature flags for within your application?
When we first started to experiment with LaunchDarkly, we weren't entirely sure what we wanted to use it for. Our first efforts were actually driven more by analytics and tracking than they were by engineering. In fact, we accidentally broke part of our application by trying to provide a user's feature flag settings as "traits" in one of our tracking systems (I think it was KissMetrics). We had ramped-up on feature flags pretty quickly and ended up getting outbound requests blocked due to HTTP Header size constraints (for the Cookie header).
Eventually, LaunchDarkly moved to the engineering side of the house and more specialized testing systems, like Optimizely, were used for our growth team's "experiments." Today, LaunchDarkly is primarily used as a way to make deployments safer. Or rather, to decouple the concept of "deployment" from the actually "rolling out" of features.
When putting in a feature flag, how much thought do you put into what kind of flag it will be? Do you have a formal process for flag clean-up?
At InVision, we have two different types of feature flags. First, we have Operational flags, which are long-lived flags that help manage the health and operation of the system. These operational flags do things like take machines out of rotation (ie, quarantine them); activate and deactivate background tasks (like migrations); manage the log-level settings in a particular environment; and, scale workflows up-and-down based on system performance. These flags are either long-lived or they are permanent fixtures that are woven into the fabric of the application.
The rest of the feature flags are short-lived flags intended for feature deployment. Ideally, all deployments are done behind a feature flag. This, however, is not always possible; nor is it always a worthwhile effort. Sometimes, you just have to trust the fact that the spelling-change you make in some copy isn't going to break the system.
Ideally, all of the short-lived feature flags are deleted soon after the feature roll-out is completed (taking into account a holistic sense of the application state and the possibility of future roll-backs). Teams are supposed to create additional "clean-up" tasks in JIRA for their feature flags such that we don't lose track of them. The reality, however, is far less sanitary. Our feature flags tend to pile up and we have to occasionally have a "purge" of flags that no longer seem relevant.
What's the best feature flag name you've used?
As a best practice, we include a JIRA ticket number in our feature flag name. This way, when you see a feature flag called, "RAIN-123-release-the-kraken", you know where to find background information on what the feature flag is being used for (JIRA Ticket RAIN-123); and, you know which team to go hassle about clean-up (the Rainbow team).
When we were a small company, we liked "fun" names; but, we have found that "fun" names don't scale well.
What was the worst thing you've ever released into production?
As a co-founder of InVision, I've been here a long time. Which means that I've had the privilege of crashing our production system more times than most people on the team. I love to write SQL - refactoring, optimizing, and decomposing complex queries is perhaps one of the more exciting things that a human can do. So, for me, most production crashes revolved around unexpected query-optimizer choices.
Last year, I deployed a query refactoring that looked fine in local development. It looked fine in our Preview Production environment. Then, when it hit our Multi-Tenant Production environment, the MySQL query optimizer started using a Multi-Range Read Optimization for some of the users. This immediately spiked the CPU from 8% to 100%. Luckily, this was behind a LaunchDarkly feature flag which I immediately turned off. Our DATA team had to go in and manually kill queries that had already been initiated. But, the application never went off-line and the database slowly started to recover!
I used to joke with people that you're not really part of the team until you crash production. I used this flippant comment as a way to underscore a blameless culture. At InVision, we're huge fans of RCA (Root Cause Analysis) documents and asking "The Five Whys." For us, it's more important to understand why something broke and how we can improve our process than it is to point the finger at human error.
The truth is, though, since we've started using LaunchDarkly, I've had to make that joke much less. By wrapping features in a feature flag, our team simply doesn't crash production nearly as much as they could be (especially considering the way our team has ballooned in size). I believe that LaunchDarkly has completely changed the way that we think about deployment and about our feature release timelines. It has forced us all to develop a more holistic understanding of our application and our user experiences.
How do you manage the rollout of features? Is it delegated to business owners / operators?
As much as possible, we take a "you build it, you run it" approach to application development. Which means, the engineers who are working on a feature should also be the engineers that are responsible for rolling it out. After all, it's those engineers that know what logs to look for in our log aggregation software; which stats to look for in a time-series database; and, which database metrics to keep an eye on. In order to truly make "dark launching" a safe workflow, it has to be managed by people who have the most holistic understanding of the software and a strong sense of what impact a rollout may have on performance.
In fact, this is a two-way street. Meaning, it makes sense for engineers to manage the feature-flags; but, as a byproduct of having to manage the feature flags, our engineers are forced to think more deeply about the application structure and the deployment process. So, not only are engineers the right people for the job, the job makes them better engineers.
How do you decide when to use something like LaunchDarkly vs. something like Optimizely for hypothesis-driven development and A/B testing?
LaunchDarkly is hella simple. Optimizely is hella complicated. Now, I don't mean to be derogatory towards Optimizely; it's complicated for a reason - it has a much richer targeting and analytics scheme that ties in with other user tracking tools. To me, Optimizely is not an either/or choice - it's a different tool with a different purpose. It works well for the "Growth" team who needs to conduct A/B testing and target specific demographics. But, it's far too robust for feature flagging.
In short, the product engineers use LaunchDarkly; the, growth engineers use Optimizely.
If you moved to a company that had an in-house feature flag system, what would you miss most about LaunchDarkly?
First and foremost, I love the fact that I don't have to own and operate LaunchDarkly. Like many of you, feature flagging is not our core product. It's not something that we dedicate time to improving. And, it's not a platform that we want to have to maintain and scale-up. The fact that we can just pay for someone else to provide it as a black-box is a huge time saver.
Second, LaunchDarkly just works really well. The way they've built it makes for an incredibly efficient client. When you ask your LaunchDarkly library for a user's feature flags, the client doesn't make an HTTP request to a remote LaunchDarkly API - it simply checks its in-memory cache. This is because changes in the LaunchDarkly dashboard are streamed to the clients using server-sent events (at least in the Java client).
What this means is that the LaunchDarkly client can be used with abandon. Check 50 flags. Check 1,000. It doesn't matter. As an engineer, you can be confident that you're not about to making a blocking, synchronous request to a potentially slow API that destroys the response time SLA for your application. It's obviously not a no-op (No Operation) to check for a feature flag; but the performance overhead is negligible.
Event synchronization is generally instantaneous. The event propagation is (from what I'm told) built on top of a Fastly CDN distribution network which promises sub-200ms propagation times. And, this is what I see most of the time.
NOTE: In some rarer cases, I've seen feature flags take several minutes to be reflected our application configuration. This however, may have something to do with how we consume LaunchDarkly in our application life-cycle.
Now, to be clear, there is a dollars-and-cents cost to using LaunchDarkly. And, when you have a large number of monthly active users (MAU), you do think about the approach you want to take to "user identification". For example, using IP-address as a user identifier in a graduated roll-out would work; but, it may also jack up your cost of operation.
That said, for us, the dollars-and-cents cost of using LaunchDarkly is far outweighed by the benefits that it brings to the table.
What advice would you have for people looking to implement feature flags?
First, I would recommend that you create some sort of "abstraction" in your code around feature flags. Meaning, don't let your application logic consume the LaunchDarkly client directly; rather, wrap it in some sort of "feature flag service" that manages the intricacies of client interaction. Not only does this make the consuming code more straightforward, it also provides a single place in which to list the active feature flags and provide sane defaults.
Second, I would create a strict boundary around the ownership of a feature flag. I have learned, from pain, that a feature flag should be owned by a single deployment. Don't share feature flags across applications, no matter how much they seem to overlap. Business requirements are never as strict as they are made out to be. "SSO Logins" don't really have to be enabled in every environment at the exact same time. Don't let your collection of feature flags become an "integration database."
Third, I would move feature-flag branching as high up in the request as possible (and as makes sense). The higher-up in the request, the more you know about the request and the user making it which makes targeting easier. You are also more likely to create a non-breaking change for consumers of the old code pathway. This helps you think more effectively about the application structure.
Fourth, really make an effort to clean up old feature flags. Or, at least, ensure that they are turned on in all environments. I can't tell you how much time I've lost because some engineer turned on a feature flag in Production but didn't turn it on in the local development environment. When your various environments aren't using the same control-flow, debugging becomes a nightmare.
Fifth, really take time to understand how user identification works. The LaunchDarkly client doesn't maintain a "database of users" - it maintains a rule-set. This is very different than a database table where you can query on different columns. With a rule-set, your dashboard targeting and your client-side identification have to be the same or you won't get the expected feature flags. This is not a limitation of LaunchDarkly - this is how all rule-based feature flagging systems work.
Sixth, don't overthink user targeting. At first, it can be very alluring to get really complicated with your user targeting. But, the more complicated your targeting gets, the more complicated your user identification gets in the code. Start simple. See if you can use nothing but a user identifier. This will automatically give a ton of value, what with on/off toggling and graduated roll-outs. The longer you can keep it simple with your targeting, the easier it will be to consume in your application.
Seventh, try to avoid - if you can - using "non-user IDs" as your "user identifier" (ie, Key). Ultimately, a user identifier is just a String. Which means that it can be anything. This makes it easy to use something like an IP-Address or a UUID as a key. The LaunchDarkly client doesn't care about this; but, it will balloon your "Monthly Active Users", which has a dollars-and-cents cost to it. Unless cost has zero consideration, I would try to limit keys such that they scale directly with the number of users in your application.
How do you manage "integration tests" when using feature flags?
Honestly, for me, the "feature flag" is the "integration test." Meaning, a huge value-add of having a feature flag is so that I can quickly enable and disable code in production while it is being exposed to "real human" load and interaction patterns. Turning on a feature flag, monitoring dashboards and alerts, and then proceeding as necessary is the "test." I don't find a need to do additional programmatic integration testing on top of that.
How do you manage LaunchDarkly across different environments?
Because LaunchDarkly operates on user "identification", you will run into issues if you have ID collisions across environments. To cope with this, LaunchDarkly provides a way to create "environments" inside the LaunchDarkly dashboard. These environments each have a unique API Key and an isolated set of users.
The unique API Key works well for our Testing and Production tiers; but, it falls down in the local development environment where we don't want to have to create hundreds of environments in LaunchDarkly, one for each developer. As such, for us, all developers share a single LaunchDarkly API Key; but, we use an Environmental variable to "prefix" the user identifier. This way, the users that I created in my local environment are all prefixed with "bennadel-" and never collide with the users created by another developer.
Well, that's that. I had a lot of fun talking to people about my experiences. If you have any other questions, feel free to ask!
So, at the Lunch-n-Learn, I had theorized that you could even use LaunchDarkly to stream JSON payloads to your application, which could provide a way to layer some administrative features on top of the LaunchDarkly dashboard. This morning, when I went to experiment with that concept, however, I discovered that LaunchDarkly actually has native support for a "JSON Type"! Originally, I had thought you could shoe-horn a JSON payload into a String-based feature flag. But, LaunchDarkly actually supports complex Array and Object structures.
This means you can stream sophisticated data structures from the LaunchDarkly dashboard to your application. This kind of flexibility really opens up the possibility of creating light-weight administrative features with next-to-no effort.
I'm not suggesting that you start doing this, necessarily; but, depending on what your business' core product is, you could definitely leverage LaunchDarkly's JSON Type if you want to focus on your company's core value-add.
I wanted to share the metaphor that has really helped me reason about the rules engine for user targeting: Pure Functions.
When looking at the rules evaluation like a Pure Function, it makes it much easier to see why variations are evaluation to True of False. And, what you need to provide in order to target the right users.