Using Step Functions to handle feature flags
We use feature flags to control how we release parts of a product. Instead of adding conditional statements in the code, we can use Step Functions to decide if the feature flag is enabled.
Published Mar 28, 2024
Bob's company has a popular application and wants to release a new feature. They want to thoroughly test it in the development environment first. But it's a requirement to continually push changes to production, so the feature might not block the deployment pipeline.
One way to manage this problem is to use feature flags. AWS AppConfig, part of the Systems Manager ecosystem, is a service that allows us to apply feature flags and configuration objects into our application.
One way to incorporate them into the code is to use conditional statements that check whether we have enabled the feature flag for the given environment. But because Bob wants to minimize code changes and reduce code complexity (and because it's fun), he decided to use Step Functions instead of
if
statements.He created a separate Lambda function with the new feature (
NewFeature
), which exists parallel to the existing code (ExistingFeature
).Let's see how this experiment worked out.
When getting a feature flag from AppConfig, we must provide three parameters.
An
application
is a namespace or a folder that contains configurations, feature flags, and environments for the given application.The
environment
is the target for the feature flag. We can name it as we like. In this example, we'll have two environments, dev
and prod
. We enable the feature flag in dev
, which runs the new code. We keep the existing code in prod
.The last element is the
configuration profile
, which can be feature flag or freeform configuration. This example will use a feature flag.I won't describe how to create applications, environments, and configuration profiles in AppConfig. I'll provide a link that explains the process at the end of the post.
First, we fetch the feature flag state (enabled or disabled) for the given environment from AppConfig.
Luckily, we (and Bob) build serverless applications and use Lambda functions. AWS provides an extension that we can integrate with our function as a layer.
If we use SAM templates to create the resources, we can add the extension like this:
The URL is different for each region and Lambda function architecture, so you need to find the right one for your scenario.
We can now call the extension from the
GetFeatureFlag
function. The code can look like this:AWS_APPCONFIG_EXTENSION_HTTP_PORT
defaults to 2772, which we can leave as is.We can have an environment variable for each mandatory AppConfig parameter, application, environment (
dev
or prod
in this case), and configuration profile (1). This way, when we deploy the resources to multiple environments, the function will know the feature flag state for the given environment.The function's return value will be similar to the following:
As we can see, AppConfig returns an object of feature flag objects.
isAllowed
is the feature flag's very creative name. The presented value refers to the dev
environment because the flag is enabled there. The value would be enabled: false
in prod
. We encapsulate the feature flag value in the config
property of the returned object (2).The function's execution role must allow the
appconfig:StartConfigurationSession
and appconfig:GetLatestConfiguration
permissions.GetFeatureFlag
is part of the state machine, so its return value (the feature flag name and its state) will be the input of the next state.In this case, it's a
Choice
state, where we decide if we call the existing function or the one with the new feature.The state's definition can look like this:
When the feature flag's value is
enabled: true
, Step Functions will call the NewFeature
function. Otherwise, it will invoke ExistingFeature
. From this point, the flow can continue as usual.We have successfully eliminated the
if
block from the code!What if we wanted to remove the
GetFeatureFlag
Lambda function and make Step Functions directly interact with AppConfig? We can do that, but there are some considerations to take.With a few lines of code in the function handler (1), the Lambda AppConfig extension does a complex job in the background.
First, it calls the
StartConfigurationSession
API endpoint, which sends back an InitialConfigurationToken
. Then, it invokes the GetLatestConfiguration
endpoint, which returns the feature flag object seen above.It then calls
GetLatestConfiguration
at a configured interval (defaults to 60 seconds) and caches the result.We can remove this Lambda function from the architecture and delegate the AppConfig API calls to Step Functions. But in this case, we have to manage everything that the AppConfig extension does for us.
The above workflow snippet shows the change only. The
Choice
state and everything after will remain the same.Step Functions integrate with 10,000+ AWS APIs, including
StartConfigurationSession
and GetLatestConfiguration
.The
StartConfigurationSession
state requires the mandatory AppConfig parameters we used in the HTTP call inside the Lambda handler. The state's API parameters section can look like this:We assume the state's input contains the
ApplicationIdentifier
, ConfigurationProfileIdentifier
, and EnvironmentIdentifier
properties.The state's output (
InitialConfigurationToken
) will be the input of the following state, GetLatestConfiguration
. This state needs one mandatory parameter called ConfigurationToken
. The relevant part of the definition can look like this:The output will be similar to this:
As we can see, the
Configuration
property contains the feature flag as expected.But there's something else here.
The
GetLatestConfiguration
call returns a token in the NextPollConfigurationToken
property. AWS recommends that clients use it for subsequent calls to the endpoint.The documentation also recommends caching the feature flag instead of continually fetching it from AppConfig. We should take this advice because AWS charges after the
GetLatestConfiguration
calls. So we want to reduce the number of invocations!It means that the client that calls the state machine should also provide the current token in the input. The first state could check if the request contains the token. In this case, the state machine could jump to the
GetLatestConfiguration
state. If the client can't provide the token (for example, because it's the first call), the state machine could call StartConfigurationSession
.Alternatively, the state machine could store the token somewhere externally, for example, in a DynamoDB table. But this solution would add at least two extra API calls (read and update token) to the flow.
All of these would increase complexity. For this reason, I would keep the Lambda function with the AppConfig extension.
It's not only feature flags that we can configure in AppConfig. It's possible to store more complex configuration objects, too.
As said above, we can have multiple feature flags for the same application and environment. If this is the case, we'll need a more complex
Choice
state configuration, which can lead to harder-to-manage states. Alternatively, Bob can write multiple if
statements in the code, one for each feature flag.AppConfig can store feature flags and other configurations we can use in our applications. With the help of the AppConfig Agent or the Lambda extension, we can fetch the feature flag from AppConfig. The extension follows the AWS-recommended flow of API calls and caches the feature flag.
We can use Step Functions and incorporate different code versions based on the feature flag value into our application.
Creating feature flags and free form configuration data in AWS AppConfig - Guide to create applications, environments, and configuration profiles
AWS AppConfig workshop - Get your hands dirty
Getting started with Lambda - How to create a Lambda function
Input and Output Processing in Step Functions - Data flow manipulation