What Are Feature Flags?

Feature flags (also called feature toggles) are conditional statements in code that control whether a feature is active. They decouple code deployment from feature release — you can deploy code to production with new features hidden, then enable them gradually.

if (featureFlags.isEnabled('new-checkout-flow')) {
  renderNewCheckout();
} else {
  renderLegacyCheckout();
}

For QA engineers, feature flags add testing complexity but also provide powerful capabilities: you can test features in production safely, control A/B experiments, and roll back problematic features instantly without redeploying.

Types of Feature Flags

TypeLifespanPurposeTesting Focus
Release flagsShort (days-weeks)Hide incomplete featuresBoth on/off states
Experiment flagsMedium (weeks-months)A/B testing, metricsVariant behavior
Ops flagsLong-livedCircuit breakers, kill switchesFailure modes
Permission flagsLong-livedUser-specific features (premium, beta)Per-user-group behavior

Testing Strategy for Feature Flags

The Combinatorial Challenge

With N independent flags, there are 2^N possible combinations. With just 10 flags, that is 1,024 combinations — impractical to test exhaustively.

Pragmatic approach:

  1. Test each flag independently in both on and off states (2N tests)
  2. Identify dependent flags that interact and test those combinations
  3. Test transitions — toggling a flag from off to on (and back) during a user session
  4. Test the default state — what users see when the flag service is down

Test Matrix Example

For a “new-checkout” feature flag:

ScenarioFlag StateWhat to Test
Feature off (default)OFFLegacy checkout works correctly
Feature onONNew checkout works correctly
Transition: off → onOFF → ONMid-session switch does not corrupt cart data
Transition: on → offON → OFFRollback does not lose user data
Flag service downFALLBACKApplication gracefully degrades to default

Testing Progressive Rollouts

Progressive rollouts expose features to increasing percentages of users:

Day 1: 1% of users  → Monitor error rates
Day 2: 5% of users  → Check key metrics
Day 3: 25% of users → Broader validation
Day 7: 100%         → Full release

QA responsibilities:

  • Verify the percentage targeting works correctly
  • Monitor error rates and user metrics at each stage
  • Have a rollback plan ready
  • Test that users in the rollout see the feature consistently

Feature Flag Tools

ToolTypeKey Feature
LaunchDarklySaaSEnterprise-grade, real-time updates
Split.ioSaaSBuilt-in experimentation
UnleashOpen sourceSelf-hosted, extensible
FlagsmithOpen sourceAPI-first, remote config
ConfigCatSaaSSimple, affordable
Custom (env vars)DIYSimple on/off toggles

Automation with Feature Flags

Testing Both States in CI

// playwright.config.ts
const projects = [
  {
    name: 'feature-off',
    use: {
      baseURL: 'https://staging.example.com',
      extraHTTPHeaders: { 'X-Feature-Flag': 'new-checkout=false' },
    },
  },
  {
    name: 'feature-on',
    use: {
      baseURL: 'https://staging.example.com',
      extraHTTPHeaders: { 'X-Feature-Flag': 'new-checkout=true' },
    },
  },
];

API-Based Flag Control

// Before tests: enable flag
await fetch('https://api.launchdarkly.com/flags/new-checkout', {
  method: 'PATCH',
  headers: { Authorization: `Bearer ${LD_API_KEY}` },
  body: JSON.stringify({ on: true }),
});

// Run tests...

// After tests: restore original state
await fetch('https://api.launchdarkly.com/flags/new-checkout', {
  method: 'PATCH',
  body: JSON.stringify({ on: false }),
});

Exercise: Design a Flag Testing Strategy

Your team is launching a new recommendation engine behind a feature flag. The flag has three variants: “off” (legacy), “basic” (simple recommendations), “advanced” (ML-powered recommendations). It is being rolled out progressively: 1% → 10% → 50% → 100%.

Design the testing strategy.

Solution

Phase 1: Pre-Rollout (Development/Staging)

Functional tests per variant:

  • OFF: Legacy product page, no recommendations section
  • BASIC: Recommendations section shows related products by category
  • ADVANCED: ML-powered recommendations, personalized per user

Transition tests:

  • User switches from OFF → BASIC: recommendations appear without page reload issues
  • User switches from BASIC → ADVANCED: ML recommendations replace basic ones
  • User switches from ADVANCED → OFF: recommendations section disappears cleanly

Performance tests:

  • BASIC adds < 50ms to page load
  • ADVANCED adds < 200ms to page load
  • No impact to page load when OFF

Phase 2: Progressive Rollout

At 1%:

  • Smoke tests in production confirming each variant works
  • Monitor: error rate, page load time, bounce rate
  • Criteria to proceed: error rate < 0.1%, no performance regression

At 10%:

  • Compare metrics between variants (A/B/C test)
  • Monitor: click-through rate, conversion rate
  • Criteria: no negative impact on conversions

At 50%:

  • Full regression suite against each variant
  • Load testing at expected traffic levels
  • Monitor all business metrics

At 100%:

  • Final validation
  • Plan for removing the flag and dead code

Rollback Plan

  • Instant: toggle flag to OFF via LaunchDarkly dashboard
  • Automated: alert triggers auto-rollback if error rate > 1%

Best Practices

  1. Clean up old flags. Feature flags are technical debt. Once a feature is fully released, remove the flag and the old code path. Track flag lifecycle with a register.

  2. Test the fallback behavior. If the flag service is down, what happens? The application should have sensible defaults.

  3. Never nest feature flags deeply. Two levels of nesting maximum. More leads to untestable complexity.

  4. Use flag overrides in test environments. Test tools should be able to force-enable or force-disable flags regardless of the targeting rules.

  5. Monitor flag state changes. Log when flags are toggled and by whom. This helps correlate production issues with flag changes.