What Are Feature Flags?
Feature flags (also called feature toggles) are conditional statements in code that control whether a feature is active. They decouple code deployment from feature release — you can deploy code to production with new features hidden, then enable them gradually.
if (featureFlags.isEnabled('new-checkout-flow')) {
renderNewCheckout();
} else {
renderLegacyCheckout();
}
For QA engineers, feature flags add testing complexity but also provide powerful capabilities: you can test features in production safely, control A/B experiments, and roll back problematic features instantly without redeploying.
Types of Feature Flags
| Type | Lifespan | Purpose | Testing Focus |
|---|---|---|---|
| Release flags | Short (days-weeks) | Hide incomplete features | Both on/off states |
| Experiment flags | Medium (weeks-months) | A/B testing, metrics | Variant behavior |
| Ops flags | Long-lived | Circuit breakers, kill switches | Failure modes |
| Permission flags | Long-lived | User-specific features (premium, beta) | Per-user-group behavior |
Testing Strategy for Feature Flags
The Combinatorial Challenge
With N independent flags, there are 2^N possible combinations. With just 10 flags, that is 1,024 combinations — impractical to test exhaustively.
Pragmatic approach:
- Test each flag independently in both on and off states (2N tests)
- Identify dependent flags that interact and test those combinations
- Test transitions — toggling a flag from off to on (and back) during a user session
- Test the default state — what users see when the flag service is down
Test Matrix Example
For a “new-checkout” feature flag:
| Scenario | Flag State | What to Test |
|---|---|---|
| Feature off (default) | OFF | Legacy checkout works correctly |
| Feature on | ON | New checkout works correctly |
| Transition: off → on | OFF → ON | Mid-session switch does not corrupt cart data |
| Transition: on → off | ON → OFF | Rollback does not lose user data |
| Flag service down | FALLBACK | Application gracefully degrades to default |
Testing Progressive Rollouts
Progressive rollouts expose features to increasing percentages of users:
Day 1: 1% of users → Monitor error rates
Day 2: 5% of users → Check key metrics
Day 3: 25% of users → Broader validation
Day 7: 100% → Full release
QA responsibilities:
- Verify the percentage targeting works correctly
- Monitor error rates and user metrics at each stage
- Have a rollback plan ready
- Test that users in the rollout see the feature consistently
Feature Flag Tools
| Tool | Type | Key Feature |
|---|---|---|
| LaunchDarkly | SaaS | Enterprise-grade, real-time updates |
| Split.io | SaaS | Built-in experimentation |
| Unleash | Open source | Self-hosted, extensible |
| Flagsmith | Open source | API-first, remote config |
| ConfigCat | SaaS | Simple, affordable |
| Custom (env vars) | DIY | Simple on/off toggles |
Automation with Feature Flags
Testing Both States in CI
// playwright.config.ts
const projects = [
{
name: 'feature-off',
use: {
baseURL: 'https://staging.example.com',
extraHTTPHeaders: { 'X-Feature-Flag': 'new-checkout=false' },
},
},
{
name: 'feature-on',
use: {
baseURL: 'https://staging.example.com',
extraHTTPHeaders: { 'X-Feature-Flag': 'new-checkout=true' },
},
},
];
API-Based Flag Control
// Before tests: enable flag
await fetch('https://api.launchdarkly.com/flags/new-checkout', {
method: 'PATCH',
headers: { Authorization: `Bearer ${LD_API_KEY}` },
body: JSON.stringify({ on: true }),
});
// Run tests...
// After tests: restore original state
await fetch('https://api.launchdarkly.com/flags/new-checkout', {
method: 'PATCH',
body: JSON.stringify({ on: false }),
});
Exercise: Design a Flag Testing Strategy
Your team is launching a new recommendation engine behind a feature flag. The flag has three variants: “off” (legacy), “basic” (simple recommendations), “advanced” (ML-powered recommendations). It is being rolled out progressively: 1% → 10% → 50% → 100%.
Design the testing strategy.
Solution
Phase 1: Pre-Rollout (Development/Staging)
Functional tests per variant:
- OFF: Legacy product page, no recommendations section
- BASIC: Recommendations section shows related products by category
- ADVANCED: ML-powered recommendations, personalized per user
Transition tests:
- User switches from OFF → BASIC: recommendations appear without page reload issues
- User switches from BASIC → ADVANCED: ML recommendations replace basic ones
- User switches from ADVANCED → OFF: recommendations section disappears cleanly
Performance tests:
- BASIC adds < 50ms to page load
- ADVANCED adds < 200ms to page load
- No impact to page load when OFF
Phase 2: Progressive Rollout
At 1%:
- Smoke tests in production confirming each variant works
- Monitor: error rate, page load time, bounce rate
- Criteria to proceed: error rate < 0.1%, no performance regression
At 10%:
- Compare metrics between variants (A/B/C test)
- Monitor: click-through rate, conversion rate
- Criteria: no negative impact on conversions
At 50%:
- Full regression suite against each variant
- Load testing at expected traffic levels
- Monitor all business metrics
At 100%:
- Final validation
- Plan for removing the flag and dead code
Rollback Plan
- Instant: toggle flag to OFF via LaunchDarkly dashboard
- Automated: alert triggers auto-rollback if error rate > 1%
Best Practices
Clean up old flags. Feature flags are technical debt. Once a feature is fully released, remove the flag and the old code path. Track flag lifecycle with a register.
Test the fallback behavior. If the flag service is down, what happens? The application should have sensible defaults.
Never nest feature flags deeply. Two levels of nesting maximum. More leads to untestable complexity.
Use flag overrides in test environments. Test tools should be able to force-enable or force-disable flags regardless of the targeting rules.
Monitor flag state changes. Log when flags are toggled and by whom. This helps correlate production issues with flag changes.