The Scaling Problem
As test suites grow, execution time becomes a bottleneck. A 2-hour test suite means 2 hours of waiting before knowing if a change is safe. Developers stop running tests, bypass pipelines, and quality degrades.
Test orchestration solves this by intelligently distributing tests across multiple machines, prioritizing the most valuable tests, and optimizing execution strategy.
Test Sharding
Divide the test suite into equal parts and run each on a separate machine:
# Playwright sharding: 4 machines
npx playwright test --shard=1/4 # Machine 1
npx playwright test --shard=2/4 # Machine 2
npx playwright test --shard=3/4 # Machine 3
npx playwright test --shard=4/4 # Machine 4
Sharding Strategies
| Strategy | How It Works | Best For |
|---|---|---|
| Round-robin | Tests distributed evenly by count | Uniform test durations |
| Duration-based | Tests distributed by estimated runtime | Varying test durations |
| File-based | Each shard gets complete test files | Test files with shared setup |
| Tag-based | Shard by category (smoke, regression, API) | Mixed test types |
CI Implementation
# GitHub Actions sharding
jobs:
e2e-tests:
strategy:
matrix:
shard: [1, 2, 3, 4]
steps:
- run: npx playwright test --shard=${{ matrix.shard }}/4
merge-reports:
needs: e2e-tests
steps:
- uses: actions/download-artifact@v4
- run: npx playwright merge-reports ./all-reports
Smart Retry Mechanisms
Retry Strategies
// Playwright retry configuration
export default defineConfig({
retries: process.env.CI ? 2 : 0, // Retry in CI, not locally
reporter: [
['html'],
['json', { outputFile: 'results.json' }],
],
});
Retry Rules
- Limit retries to 1-2 — more than that indicates a real problem
- Log all retried tests — track flaky test rate over time
- Differentiate retry reasons — network timeout vs. assertion failure
- Quarantine chronic flailers — tests that fail >5% of runs need fixing, not retrying
Test Prioritization
Risk-Based Prioritization
Run the most valuable tests first:
- Tests covering recently changed code (highest priority)
- Tests that failed recently (regression catch)
- Tests covering critical business paths (checkout, login)
- New tests (covering new features)
- Stable tests with no recent failures (lowest priority)
Test Impact Analysis
Map code changes to affected tests:
Developer changes: src/payment/checkout.ts
Affected tests:
- tests/checkout.spec.ts (direct import)
- tests/cart-to-order.spec.ts (uses checkout module)
- tests/payment-flow.spec.ts (integration test)
Unaffected tests (skip):
- tests/search.spec.ts
- tests/profile.spec.ts
- tests/admin.spec.ts
This reduces a 60-minute suite to 10 minutes for most changes.
Test Orchestration Platforms
| Platform | Key Feature | Type |
|---|---|---|
| Playwright (built-in) | Sharding, retries, projects | Framework |
| Currents.dev | Playwright orchestration and analytics | SaaS |
| Sorry Cypress | Cypress parallelization (open source) | Self-hosted |
| Sauce Labs | Cloud browser grid | SaaS |
| BrowserStack | Cloud device/browser farm | SaaS |
| Buildkite Test Analytics | Test suite insights and splitting | SaaS |
Exercise: Optimize a Slow Test Suite
Your E2E suite has 200 tests taking 90 minutes on a single machine. Tests are a mix of fast (30s) and slow (3min) tests. Target: under 15 minutes. Design the optimization strategy.
Solution
Step 1: Analyze Current State
- Categorize tests by duration: fast (<1min), medium (1-2min), slow (>2min)
- Identify flaky tests (>2% failure rate)
- Map tests to features for impact analysis
Step 2: Shard by Duration
- 8 shards with duration-based balancing
- Each shard targets ~11 minutes of tests
- Use historical timing data for balancing
Step 3: Smart Retries
- 1 retry for all tests in CI
- Quarantine tests with >5% flaky rate
- Track flaky rate dashboard
Step 4: Test Selection for PRs
- Run only tests affected by changed files (impact analysis)
- Always run smoke tests (top 20 critical paths)
- Full suite runs nightly and on merge to main
Step 5: Infrastructure
- 8 parallel CI runners
- Docker images pre-built with browsers
- Aggressive caching of dependencies
Expected Results
- PR pipeline: ~8 minutes (impact analysis + smoke tests)
- Full suite: ~12 minutes (8 shards)
- Nightly: 90 minutes (single machine, full coverage including slow tests)
Key Takeaways
- Sharding is the fastest way to reduce suite time — linear speedup with more machines
- Smart retries mitigate flakiness but mask real problems if overused
- Test impact analysis gives fastest PR feedback — run only affected tests
- Track metrics — suite duration, flaky rate, and shard balance over time
- Invest in infrastructure — faster machines and parallel runners pay for themselves in developer productivity