Why Test in Production?

Pre-production environments, no matter how carefully configured, never perfectly replicate production. Production has real user data patterns, real traffic volumes, real third-party integrations, and real infrastructure complexity. Some bugs only surface under these conditions.

Testing in production does not mean abandoning pre-production testing. It means adding a layer of validation that catches what pre-production testing cannot.

Safe Testing in Production Strategies

Synthetic Monitoring

Automated scripts that continuously execute critical user journeys against production:

// Synthetic check: Login flow
async function checkLogin() {
  const response = await fetch('https://api.example.com/auth/login', {
    method: 'POST',
    body: JSON.stringify({
      email: 'synthetic-monitor@example.com',  // Dedicated test account
      password: process.env.SYNTHETIC_PASSWORD,
    }),
  });

  if (response.status !== 200) {
    alert('Login flow broken!');
  }

  const data = await response.json();
  if (!data.token) {
    alert('Login returns no token!');
  }
}
// Run every 5 minutes, 24/7

Key rules for synthetic monitoring:

  • Use dedicated test accounts, never real user accounts
  • Tests must be non-destructive (read-only operations or use test payment methods)
  • Run from multiple geographic locations to detect regional issues
  • Monitor both success rate and response time

Dark Launching

Deploy new code to production but do not expose it to users. The new code processes real requests in the background, but its results are discarded:

async function getProductRecommendations(userId) {
  // Current (live) path
  const currentResults = await currentEngine.recommend(userId);

  // New engine (dark launch) — runs but results are not shown to users
  try {
    const newResults = await newEngine.recommend(userId);
    // Log comparison for analysis
    metrics.compare('recommendations', currentResults, newResults);
  } catch (error) {
    // New engine errors do not affect users
    logger.warn('Dark launch error', error);
  }

  return currentResults;  // Always return current results
}

Traffic Mirroring

Copy production traffic to a shadow environment:

# Istio traffic mirroring configuration
apiVersion: networking.istio.io/v1
kind: VirtualService
spec:
  hosts:
    - product-service
  http:
    - route:
        - destination:
            host: product-service
            subset: v1
      mirror:
        host: product-service
        subset: v2-shadow
      mirrorPercentage:
        value: 10.0  # Mirror 10% of traffic

Canary Testing

Route a small percentage of real traffic to the new version (covered in detail in Lesson 9.11).

Observability-Driven Testing

Use production monitoring to continuously verify quality:

  • Error budgets: Track how much of your error budget has been consumed
  • Anomaly detection: Alert when metrics deviate from learned baselines
  • Real user monitoring (RUM): Track actual user experience metrics

When NOT to Test in Production

ScenarioRiskAlternative
Tests that create real ordersFinancial impactUse test accounts with sandbox payment
Tests that send real emails/SMSUser confusionUse test notification channels
Load tests at full scalePerformance degradationRun during low-traffic hours or use shadow env
Destructive database operationsData lossNever in production
Tests involving PIIPrivacy violationUse synthetic data

Exercise: Design a Production Testing Strategy

Your team launches a new search engine for an e-commerce site. Design a production testing strategy that validates the new search without affecting users.

Solución

Phase 1: Dark Launch (Week 1)

  • Deploy new search engine behind feature flag (off for all users)
  • Mirror 5% of search queries to new engine
  • Compare results: relevance, response time, error rate
  • Log comparisons for analysis
  • Criteria to proceed: new engine P95 < 200ms, zero errors, relevance score >= current

Phase 2: Synthetic Monitoring (Week 2)

  • 50 predefined search queries running every 10 minutes
  • Verify result count, response time, and result relevance
  • Alert if any synthetic check fails twice consecutively
  • Run from 3 geographic regions

Phase 3: Canary (Week 3)

  • Enable new search for 1% of users via feature flag
  • Compare metrics: click-through rate, conversion from search, bounce rate
  • Monitor: error rate, response time, user complaints
  • Gradually increase to 5%, 25%, 50%, 100%

Phase 4: Ongoing Production Testing

  • Synthetic monitoring: 24/7, 50 queries every 10 minutes
  • A/B experiments for search relevance improvements
  • Real user monitoring for search performance
  • Weekly review of search quality metrics

Key Takeaways

  1. Production testing supplements, not replaces, pre-production testing
  2. Synthetic monitoring catches outages before users report them
  3. Dark launching validates new code with real traffic, zero user impact
  4. Traffic mirroring tests at production scale without risk
  5. Always have safeguards — test accounts, feature flags, non-destructive operations