The Scaling Problem

As test suites grow, execution time becomes a bottleneck. A 2-hour test suite means 2 hours of waiting before knowing if a change is safe. Developers stop running tests, bypass pipelines, and quality degrades.

Test orchestration solves this by intelligently distributing tests across multiple machines, prioritizing the most valuable tests, and optimizing execution strategy.

Test Sharding

Divide the test suite into equal parts and run each on a separate machine:

# Playwright sharding: 4 machines
npx playwright test --shard=1/4  # Machine 1
npx playwright test --shard=2/4  # Machine 2
npx playwright test --shard=3/4  # Machine 3
npx playwright test --shard=4/4  # Machine 4

Sharding Strategies

StrategyHow It WorksBest For
Round-robinTests distributed evenly by countUniform test durations
Duration-basedTests distributed by estimated runtimeVarying test durations
File-basedEach shard gets complete test filesTest files with shared setup
Tag-basedShard by category (smoke, regression, API)Mixed test types

CI Implementation

# GitHub Actions sharding
jobs:
  e2e-tests:
    strategy:
      matrix:
        shard: [1, 2, 3, 4]
    steps:
      - run: npx playwright test --shard=${{ matrix.shard }}/4

  merge-reports:
    needs: e2e-tests
    steps:
      - uses: actions/download-artifact@v4
      - run: npx playwright merge-reports ./all-reports

Smart Retry Mechanisms

Retry Strategies

// Playwright retry configuration
export default defineConfig({
  retries: process.env.CI ? 2 : 0,  // Retry in CI, not locally
  reporter: [
    ['html'],
    ['json', { outputFile: 'results.json' }],
  ],
});

Retry Rules

  1. Limit retries to 1-2 — more than that indicates a real problem
  2. Log all retried tests — track flaky test rate over time
  3. Differentiate retry reasons — network timeout vs. assertion failure
  4. Quarantine chronic flailers — tests that fail >5% of runs need fixing, not retrying

Test Prioritization

Risk-Based Prioritization

Run the most valuable tests first:

  1. Tests covering recently changed code (highest priority)
  2. Tests that failed recently (regression catch)
  3. Tests covering critical business paths (checkout, login)
  4. New tests (covering new features)
  5. Stable tests with no recent failures (lowest priority)

Test Impact Analysis

Map code changes to affected tests:

Developer changes: src/payment/checkout.ts

Affected tests:
  - tests/checkout.spec.ts      (direct import)
  - tests/cart-to-order.spec.ts  (uses checkout module)
  - tests/payment-flow.spec.ts   (integration test)

Unaffected tests (skip):
  - tests/search.spec.ts
  - tests/profile.spec.ts
  - tests/admin.spec.ts

This reduces a 60-minute suite to 10 minutes for most changes.

Test Orchestration Platforms

PlatformKey FeatureType
Playwright (built-in)Sharding, retries, projectsFramework
Currents.devPlaywright orchestration and analyticsSaaS
Sorry CypressCypress parallelization (open source)Self-hosted
Sauce LabsCloud browser gridSaaS
BrowserStackCloud device/browser farmSaaS
Buildkite Test AnalyticsTest suite insights and splittingSaaS

Exercise: Optimize a Slow Test Suite

Your E2E suite has 200 tests taking 90 minutes on a single machine. Tests are a mix of fast (30s) and slow (3min) tests. Target: under 15 minutes. Design the optimization strategy.

Solution

Step 1: Analyze Current State

  • Categorize tests by duration: fast (<1min), medium (1-2min), slow (>2min)
  • Identify flaky tests (>2% failure rate)
  • Map tests to features for impact analysis

Step 2: Shard by Duration

  • 8 shards with duration-based balancing
  • Each shard targets ~11 minutes of tests
  • Use historical timing data for balancing

Step 3: Smart Retries

  • 1 retry for all tests in CI
  • Quarantine tests with >5% flaky rate
  • Track flaky rate dashboard

Step 4: Test Selection for PRs

  • Run only tests affected by changed files (impact analysis)
  • Always run smoke tests (top 20 critical paths)
  • Full suite runs nightly and on merge to main

Step 5: Infrastructure

  • 8 parallel CI runners
  • Docker images pre-built with browsers
  • Aggressive caching of dependencies

Expected Results

  • PR pipeline: ~8 minutes (impact analysis + smoke tests)
  • Full suite: ~12 minutes (8 shards)
  • Nightly: 90 minutes (single machine, full coverage including slow tests)

Key Takeaways

  1. Sharding is the fastest way to reduce suite time — linear speedup with more machines
  2. Smart retries mitigate flakiness but mask real problems if overused
  3. Test impact analysis gives fastest PR feedback — run only affected tests
  4. Track metrics — suite duration, flaky rate, and shard balance over time
  5. Invest in infrastructure — faster machines and parallel runners pay for themselves in developer productivity