Senior 3 min · March 17, 2026

Feature Flags: Stale Flag Causes 15-Minute Outage

A stale flag at 100% for 3 months caused a NullPointerException, taking down checkout for 10% of users.

N
Naren · Founder
Plain-English first. Then code. Then the interview question.
About
 ● Production Incident 🔎 Debug Guide
Quick Answer
  • A feature flag is a conditional in your code that controls whether a feature is active.
  • Deploy code with the feature off, then turn it on without a new deployment.
  • Flags enable canary releases, kill switches, A/B tests, and trunk-based development.
  • Use percentage rollouts with consistent hashing to ensure same user gets same experience.
  • Flag evaluation latency adds ~1-5ms per check; batch evaluations or use SDK caching to stay under 2ms.
  • Flag debt from unused conditionals is a real maintenance trap — set TTLs and schedule removals.

Basic Flag Implementation

The simplest feature flag is just an if statement controlled by an environment variable or a config value. This pattern works for small teams and simple rollouts. For a production grade approach, you need consistent user bucketing — the same user must always see the same experience. A common way is to hash the flag name with the user ID and take modulo 100 to assign a bucket.

Here's the minimal pattern in Python, using the io.thecodeforge namespace for all production packages.

PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Package: io.thecodeforge.python.devops

# Simplest possible feature flag — environment variable
import os

def get_recommendations(user_id: int):
    if os.getenv('ENABLE_ML_RECOMMENDATIONS', 'false') == 'true':
        return ml_recommendations(user_id)   # new ML-based system
    else:
        return rule_based_recommendations(user_id)  # old system

# Better: percentage rollout — test on a fraction of users
import hashlib

def is_flag_enabled(flag_name: str, user_id: int, percentage: float) -> bool:
    """Consistently assign users to buckets using hash — same user always gets same result."""
    hash_input = f'{flag_name}:{user_id}'.encode()
    hash_val   = int(hashlib.md5(hash_input).hexdigest(), 16)
    bucket     = (hash_val % 100) + 1  # 1-100
    return bucket <= percentage

# Roll out to 5% of users
def get_checkout_flow(user_id: int):
    if is_flag_enabled('new_checkout', user_id, 5.0):
        return new_checkout_flow(user_id)
    return old_checkout_flow(user_id)
Environment Variable Flags Are Fragile
Using environment variables per flag works for a handful of toggles, but as you scale to hundreds of flags, you need a dedicated flag service with targeting rules and audit trails. Environment variables are also hard to change at runtime without a restart.
Production Insight
Hash collisions are rare but possible — use a long hash (MD5 or SHA-256) and validate bucket distribution on a sample of users.
A common mistake is using the user ID alone without the flag name, causing the same user to get inconsistent experiences across different flags.
Rule: always include the flag name in the hash input.
Key Takeaway
Start simple, but plan to migrate to a service before you hit 10 flags.
Consistent hashing is non-negotiable for percentage rollouts.
Test bucket distribution — a biased hash can ruin A/B tests.
When to use a simple flag vs. a flag service
IfFewer than 5 flags, single environment, small team
UseEnvironment variable flags are fine. Keep a checklist to track removal.
IfMore than 5 flags or multiple environments (staging, prod)
UseUse a dedicated flag service (LaunchDarkly, Unleash) for targeting, audit, and easy management.
IfNeed real-time changes (e.g., kill switch)
UseFlag service with streaming evaluation is necessary. Polling every 30 seconds is too slow for a kill switch.

Feature Flag Service — LaunchDarkly SDK Pattern

When your team needs targeting by user attributes (plan, country, beta group), a dedicated flag service is the way to go. The SDK handles evaluation, caching, and streaming updates. This example shows how to use the LaunchDarkly SDK in Python, evaluating a flag with a rich user context.

PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Package: io.thecodeforge.python.devops

# Using a feature flag service (LaunchDarkly, Unleash, Flagsmith)
import ldclient
from ldclient.config import Config

ldclient.set_config(Config(sdk_key='your-sdk-key'))
client = ldclient.get()

# Evaluate a flag for a specific user
def get_dashboard(user):
    context = {
        'key': str(user.id),
        'name': user.name,
        'email': user.email,
        'plan': user.subscription_plan,   # target premium users
        'country': user.country           # GDPR rollout by country
    }

    # Flag evaluated with user context — targeting rules in dashboard
    if client.variation('new-dashboard-v2', context, default=False):
        return render_new_dashboard(user)
    return render_old_dashboard(user)
Use Contextual Defaults
The default parameter in client.variation() is critical. If the flag service is unreachable (network partition, service down), the SDK falls back to this default. Always default to the old/safe behavior — never default to enabling a new feature.
Production Insight
SDK caching can mask stale flag values for up to the cache TTL (often 30 seconds). If you need instant rollback, use a CDN or feature flag proxy that streams updates.
Evaluation context size matters: large user objects (100+ attributes) can add 5-10ms to evaluation time.
Rule: keep context attributes under 20 and use streaming for time-sensitive flags.
Key Takeaway
Always provide a safe fallback default.
Streaming beats polling for kill switches.
Evaluate flags early, pass results down — don't eval inside loops.
Flag SDK evaluation strategy
IfNetwork latency to flag service > 10ms
UseUse a local cache with a short TTL (5-10 seconds) to avoid synchronous network calls on every request.
IfFlag changes need to propagate within seconds
UseEnable streaming (WebSocket or Server-Sent Events) to push changes, not poll.
IfSingle flag evaluated hundreds of times per request (e.g., in a loop)
UseEvaluate the flag once at the start of the request and pass the result as a parameter. Avoid repeated evaluations.

Types of Feature Flags

Not all feature flags are the same. Pete Hodgson's taxonomy (from Martin Fowler's article) defines four types: release toggles, experiment toggles, ops toggles, and permission toggles. Release toggles are short-lived — they control rollout of a new feature. Experiment toggles are for A/B tests and should be removed after the experiment ends. Ops toggles are kill switches and circuit breakers — they must be fast and reliable. Permission toggles (entitlement flags) enable features for specific user segments (e.g., premium plan users) and can live long-term.

Mixing these types leads to confusion. Use naming conventions to distinguish: release_, exp_, ops_, perm_.

PYTHON
1
2
3
4
5
6
7
# Package: io.thecodeforge.python.devops

# Naming convention for flag types
release_flag_variation = client.variation('release_new_checkout_v3', context, default=False)
experiment_flag_variation = client.variation('exp_checkout_button_color', context, default='blue')
ops_flag_variation = client.variation('ops_disable_payment_gateway', context, default=False)
perm_flag_variation = client.variation('perm_premium_dashboard', context, default=False)
Flag Types as Lifecycle Stages
  • Release flags: live 1 day – 2 weeks. Remove once rollout reaches 100%.
  • Experiment flags: live for the duration of the experiment (days to months). Remove after analysis.
  • Ops flags: live indefinitely but must be easy to toggle and have monitoring.
  • Permission flags: live indefinitely, but should be managed by a product config system, not a feature flag tool.
Production Insight
Permission flags in a feature flag service create a hidden dependency — if the service goes down, all premium users lose access.
Ops flags must have a dashboard button for emergency toggling, not a CLI command that takes 5 minutes to find.
Rule: never use a feature flag service for permanent permissions — use a role-based access control (RBAC) system instead.
Key Takeaway
Name flags by type to avoid confusion.
Permanent permissions don't belong in feature flag tools.
Ops flags need monitoring and a dashboard toggle.
Which flag type should you use?
IfRolling out a new feature to all users gradually
UseUse a release flag. Plan to remove it within 2 weeks of reaching 100% rollout.
IfTesting two versions of a UI element to measure engagement
UseUse an experiment flag. Ensure proper sample size calculation and statistical rigor.
IfNeed to instantly disable a misbehaving API call
UseUse an ops flag. Make sure the flag evaluation is fast (<1ms) and the toggle is available in a dashboard.
IfShow a feature only to paying users
UseUse a permission flag, but implement it via a user attribute lookup (database or auth token) rather than a feature flag SDK.

Canary Releases and Gradual Rollout with Flags

Canary releases are about routing a percentage of traffic to a new version of the service at the infrastructure level (e.g., Kubernetes canary deployments). But feature flags can enhance canaries by allowing you to target specific user segments within the canary pod. For example, you deploy the new version to 5% of pods, then use a feature flag to only enable the new feature for 10% of users hitting those pods. This gives you fine-grained control.

This pattern is common at large scale: you canary the deployment at the pod level, and inside the pod, use a flag to limit exposure further. This reduces blast radius if the new version has a bug — only a subset of the canary group sees the broken code.

PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Package: io.thecodeforge.python.devops

# Canary with feature flag: even if the pod receives traffic, only a fraction of users get the new feature
import hashlib

def compute_bucket(user_id, flag_name, total_percent):
    hash_val = int(hashlib.md5(f'{flag_name}:{user_id}'.encode()).hexdigest(), 16)
    return (hash_val % 100) + 1 <= total_percent

# Canary: 5% of pods run new code, but only 20% of users on those pods get the feature
# That's effectively 1% of total users
if compute_bucket(user_id, 'new_recommendation_v2', 20):
    # This code only runs in the canary pods
    return new_recommendation_system(user_id)
else:
    return old_system(user_id)
Hybrid Canary vs Pure Flag Canary
Pure flag canary: deploy the new code to all pods but turn the flag off. Then gradually increase the flag percentage. This is simpler but uses more resources (both old and new code paths are always loaded). Hybrid is safer for risky changes because the new code is only present in a subset of pods.
Production Insight
If you only use flags for canary, you must ensure the flag evaluation does not add noticeable latency. Use a local cache or a fast evaluation path.
Monitoring the canary: you need separate dashboards for the canary group vs the control group. Use the flag context to tag traces and metrics.
Rule: always run a canary for at least 10 minutes before ramping up. Watch error rates, latency, and business metrics.
Key Takeaway
Hybrid canary = pod-level + flag-level control for maximum safety.
Monitor the canary group separately — don't mix metrics with the control group.
Have a kill switch ops flag ready before starting the canary.
Canary strategy: flag only vs hybrid
IfLow-risk change (UI change, non-critical path)
UsePure flag canary: deploy to all pods, enable flag for 1% of users first.
IfHigh-risk change (database schema migration, payment logic)
UseHybrid canary: deploy to 5% of pods, then enable flag for 10% of users within those pods.
IfNeed to roll back instantly for a critical bug
UseUse an ops flag alongside the canary. Turn the ops flag on to immediately disable the new code path, even if the flag percentage is high.

Managing Flag Debt and Cleanup

Flag debt is the accumulation of stale conditionals in your code. Every flag that is no longer needed but still present forces your team to maintain two paths. Over time, the old path can break silently because it's rarely tested. The solution is to make flags ephemeral: set a removal date when you create the flag, automate reminders, and schedule cleanup as part of your sprint cycle.

A good rule: if a release flag has been at 100% for more than two weeks, it must be removed. For experiment flags, remove after the experiment analysis is complete — don't keep them 'just in case'. Ops flags and permission flags are exceptions, but they should be reviewed quarterly.

PYTHON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Package: io.thecodeforge.python.devops

# Example: automate flag cleanup detection in CI
# This would be a script that checks git for old flag references

import subprocess
import re

FLAG_PATTERN = r'client\.variation\([\'"]([\w-]+)[\'"]'

def find_old_flags(months: int = 3):
    # Get all flags used in codebase
    result = subprocess.run(['grep', '-roPh', FLAG_PATTERN, 'src/'], capture_output=True, text=True)
    flags = set(re.findall(FLAG_PATTERN, result.stdout))
    # Check each flag's metadata (would use API in real life)
    # For now, just list them
    return flags

# In CI, warn if a release flag is older than 2 weeks
# This helps reduce flag debt
Flag Debt Causes Real Outages
A stale flag with a code path that is never exercised can break when a refactoring touches the old code. The outage described in the production incident above happened exactly this way. Treat flag cleanup as a security practice.
Production Insight
Automation is key: add a lint rule that flags any client.variation() call for a flag that is > 2 weeks at 100% rollout.
Manual audits every quarter are better than nothing but often get skipped.
Rule: when you create a flag, create a corresponding JIRA ticket with a due date for removal.
Key Takeaway
Create flags with an expiry date.
Automate flag debt detection in CI.
If a flag is at 100% for more than 2 weeks, schedule its removal now.
Flag cleanup priority
IfRelease flag at 100% for > 2 weeks
UseHigh priority: remove within next sprint. The old code path is dead and should be deleted.
IfExperiment flag ended > 1 month ago
UseMedium priority: remove after analysis report is finalized. Keep the winning variant, delete the rest.
IfOps flag never toggled in 6 months
UseLow priority but review: is this ops flag still needed? If not, remove it to simplify the codebase.
● Production incidentPOST-MORTEMseverity: high

Flag That Never Died: A Stale Flag Causes a 15-Minute Production Outage

Symptom
Starting at 14:32 UTC, 10% of users received HTTP 500 errors on the checkout page. The error rate climbed to 50% within 4 minutes. The team was not deploying — the system degraded on its own.
Assumption
The on-call engineer assumed a recent deployment caused the issue and initiated a rollback. The rollback did not fix the problem because the underlying flag evaluation code was months old.
Root cause
A feature flag from a previous quarter's experiment was never removed. After marking the flag as '100% rollout', the team stopped tracking it. Three months later, a refactor of the recommendation service broke the flag's evaluation path — the flag was still evaluated for every request, and the missing method threw a NullPointerException.
Fix
1. Identified the failing flag via exception stack traces pointing to flag evaluation code. 2. Turned the flag OFF globally to restore the stable fallback path. 3. Removed the flag from the codebase permanently. 4. Added a monitoring alert for any flag that remains at 100% rollout for more than 2 weeks.
Key lesson
  • Short-lived flags must die on a schedule — never let a rollout flag live past 2 weeks at 100%.
  • Flag evaluation should never throw: always provide a safe default and catch evaluation errors gracefully.
  • Monitor flag usage: alert when a flag has been at 100% for more than 14 days.
Production debug guideCommon symptoms and actions to identify and fix flag-related problems in production4 entries
Symptom · 01
A/B test shows no significant difference — or worse, both groups show the same behaviour.
Fix
Verify flag assignment is consistent: check the hash function and user identifier. Use a deterministic hash (e.g., MD5 of flag_name + user_id). Test that the same user always gets the same variant across restarts.
Symptom · 02
New feature suddenly visible to all users, even though rollout percentage is set to 5%.
Fix
Check for a default value override. Many SDKs use a default of 'false', but if the default was inadvertently set to 'true' or if the flag service is unreachable and the SDK falls back to 'true', all users see it. Inspect the SDK configuration and fallback logic.
Symptom · 03
Flag evaluation is slow (~50ms+ per check), causing API latency spikes.
Fix
Check if flag evaluation is making a network call per request. Most SDKs cache flag results locally. Ensure caching is enabled and the TTL is appropriate (1-30 seconds). If using a custom service, add a local cache with a short TTL to absorb load.
Symptom · 04
Rolling back a flag does not immediately fix a production issue — users still see the broken behaviour.
Fix
Verify that the flag's state change propagated to all application instances. Some SDKs poll the flag service with a delay (e.g., 30 seconds). Use streaming or webhooks for near-instant propagation. Check the application logs to confirm the new flag value was fetched.
★ Feature Flag Debug Cheat SheetQuick commands and checks to diagnose flag-related problems in production.
User sees wrong experience
Immediate action
Check the flag evaluation result for that user in the SDK logs or dashboard.
Commands
`curl -X GET "https://flags.example.com/eval?flag=my-feature&user=user123"`
`kubectl logs pod/my-app-pod | grep "flag_eval" | tail -50`
Fix now
If the flag service is down, override the default in your application environment variable: export MY_FEATURE_FLAG=false and restart the pod.
Rollback not taking effect+
Immediate action
Force a flag refresh by bouncing the pod or hitting the SDK's refresh endpoint.
Commands
`kill -HUP $(pgrep my-app)` (if the app reloads flags on SIGHUP)
`kubectl rollout restart deployment/my-app`
Fix now
As a fallback, remove the flag code path from the repository and redeploy. This is heavy but guarantees the broken code is gone.
Latency spikes on page load+
Immediate action
Check if flag evaluations are blocking the main thread (synchronous SDK calls).
Commands
`curl -w "@%{time_total}\n" -o /dev/null -s "https://myapp.com/api/checkout"`
`jstack $(pgrep -f my-app) | grep "FlagClient"`
Fix now
Switch to an async SDK or batch flag evaluations. Set a cache with a short TTL (e.g., 5 seconds) and use a local fallback.
Feature Flag Types Comparison
TypeLifecycleExampleRemoval Policy
Release ToggleShort-lived (days to weeks)Deploy new checkout flowRemove at 100% rollout + 2 weeks
Experiment ToggleMedium-lived (days to months)A/B test button colorRemove after experiment analysis
Ops ToggleLong-lived (indefinite)Kill switch for payment gatewayReview quarterly, monitor usage
Permission ToggleLong-lived (indefinite)Show premium featureUse RBAC instead if possible

Key takeaways

1
Feature flags decouple deployment from release
ship code dark, turn it on when ready.
2
Use consistent hashing for percentage rollouts
same user always gets the same experience.
3
Kill switches are flags with an immediate-off capability
essential for production safety.
4
Short-lived flags for releases; long-lived flags for A/B tests and operational controls.
5
Flag debt is real
remove flags after rollout is complete or the experiment ends.
6
Always set a safe default in the evaluation call (old behavior).
7
Plan flag removal at creation time
set a TTL and schedule a cleanup ticket.

Common mistakes to avoid

4 patterns
×

Using environment variables for hundreds of flags

Symptom
Flag management becomes a nightmare: no audit trail, no targeting, no gradual rollout control. A stale env var can linger forever.
Fix
Migrate to a dedicated feature flag service once you have more than 5 flags. Start with a SaaS tool like LaunchDarkly or open-source Unleash.
×

Not providing a safe default in the evaluation call

Symptom
If the flag service goes down, the SDK returns the default. If the default is True, your new feature becomes enabled for everyone — potentially exposing unstable code or causing a crash.
Fix
Always set the default parameter to the old/stable behavior. In LaunchDarkly: client.variation('flag-key', context, default=False) where False means old code path.
×

Evaluating flags inside loops or hot code paths

Symptom
Even with caching, evaluating a flag 1000 times in a single request adds 500ms+ latency. This kills page load times and increases CPU usage.
Fix
Evaluate the flag once per request at the entry point (e.g., middleware or controller) and pass the result as a parameter to downstream functions.
×

Keeping experiment flags after analysis is complete

Symptom
Dead code branches accumulate, making the codebase harder to navigate and increasing the risk of bugs in untested paths.
Fix
Set a TTL on the flag in the dashboard. When the experiment ends, schedule a cleanup ticket. Run a regular (e.g., monthly) flag audit.
INTERVIEW PREP · PRACTICE MODE

Interview Questions on This Topic

Q01JUNIOR
What is a feature flag and what problems does it solve?
Q02SENIOR
How do you ensure a user consistently gets the same experience with a pe...
Q03SENIOR
What is flag debt?
Q04SENIOR
Describe the four types of feature flags and when to use each.
Q01 of 04JUNIOR

What is a feature flag and what problems does it solve?

ANSWER
A feature flag is a conditional toggle that controls whether a feature is active at runtime. It decouples deployment from release, so you can ship code to production without making it visible to users. This solves: 1) risk — you can roll back instantly by flipping a flag instead of redeploying, 2) gradual rollout — you can expose a feature to 1% of users first, 3) A/B testing — you can run experiments, and 4) trunk-based development — developers can merge incomplete features without breaking main.
FAQ · 4 QUESTIONS

Frequently Asked Questions

01
What is the difference between a canary release and a feature flag?
02
What is flag debt and how do you manage it?
03
Do feature flags add latency to requests?
04
Should I use environment variables or a dedicated service for feature flags?
🔥

That's CI/CD. Mark it forged?

3 min read · try the examples if you haven't

Previous
Semantic Versioning Explained
11 / 14 · CI/CD
Next
Release Management Best Practices