What Are Flaky Tests?
A flaky test is a test that passes and fails intermittently without any code changes. You run the test suite — it passes. You run it again on the same code — a test fails. You run it a third time — it passes again. This non-deterministic behavior destroys trust in the test suite and wastes enormous amounts of developer time investigating false failures.
Industry data shows that flaky tests are the number one complaint of development teams about test automation. Google reported that 1.5% of their tests were flaky, and those tests consumed 2-16% of their entire compute resources through retries. At scale, even a small flaky test percentage has massive impact.
Root Causes of Flakiness
1. Timing and Synchronization Issues (Most Common)
The test interacts with the UI before an element is ready, or checks a condition before an asynchronous operation completes.
Bad pattern:
await page.click('#submit');
// Element might not exist yet!
const message = await page.textContent('.success-message');
expect(message).toBe('Order placed');
Fixed pattern:
await page.click('#submit');
await expect(page.locator('.success-message')).toHaveText('Order placed');
// Playwright automatically waits for the element and retries
2. Test Order Dependencies
Tests that depend on other tests running first (shared state, data created by previous tests).
3. Shared Mutable State
Tests that modify global state (database, files, environment variables) without proper isolation.
4. External Service Dependencies
Tests that call real external services which may be slow, rate-limited, or occasionally unavailable.
5. Resource Contention
Tests competing for limited resources (ports, file handles, database connections) during parallel execution.
6. Time-Dependent Logic
Tests that depend on the current time, day of week, or timezone.
Fixing Flaky Tests
Replace Sleep with Explicit Waits
// BAD — hardcoded sleep
await page.click('#submit');
await page.waitForTimeout(3000);
expect(await page.textContent('.result')).toBe('Success');
// GOOD — wait for condition
await page.click('#submit');
await expect(page.locator('.result')).toHaveText('Success', { timeout: 10000 });
Ensure Test Isolation
@BeforeEach
void isolateTest() {
database.beginTransaction();
// Each test gets a clean state
}
@AfterEach
void cleanupTest() {
database.rollbackTransaction();
// All changes are undone
}
Mock External Services
await page.route('**/api/external-service/**', route => {
route.fulfill({
status: 200,
body: JSON.stringify({ result: 'mocked response' })
});
});
Flaky Test Detection Systems
Repeat Mode in PR Pipeline
Run new or modified tests multiple times before merging:
# GitHub Actions example
- name: Run new tests 20 times
run: |
for i in {1..20}; do
npx playwright test --grep @new
done
Flakiness Tracking Dashboard
Track each test’s pass/fail history over time:
Test: testCheckoutFlow
Last 100 runs: 96 pass, 4 fail (96% reliability)
Status: FLAKY (below 99% threshold)
Last failure: 2024-01-15 — TimeoutError on .payment-confirmation
Assigned to: @developer-alice
Automatic Flaky Detection
# Analyze test results across CI runs
def detect_flaky_tests(results_last_30_days):
for test in results:
pass_rate = test.passes / (test.passes + test.failures)
if 0.5 < pass_rate < 0.99:
mark_as_flaky(test)
notify_team(test)
Quarantine System
When a test is identified as flaky:
- Mark it: Add a
@Flakytag or move to a quarantine suite - Isolate it: Remove from the blocking CI pipeline
- Monitor it: Continue running in a separate non-blocking job
- Fix it: Assign an owner and set a deadline
- Restore it: Once fixed and stable for N runs, move back to the main suite
@Tag("quarantine")
@Flaky(reason = "Intermittent timeout on slow CI runners", ticket = "BUG-123")
@Test
void testCheckoutWithCoupon() {
// This test is quarantined - it runs but does not block deployment
}
Prevention Best Practices
- Never use hardcoded waits — always use explicit conditions
- Run tests in random order — catches order dependencies early
- Repeat new tests — run 20-50 times before merging
- Mock external services — eliminate network variability
- Use unique test data — avoid conflicts between parallel tests
- Set realistic timeouts — long enough for slow CI, short enough to fail fast
- Review flaky metrics weekly — make flakiness visible to the team
Exercises
Exercise 1: Diagnose and Fix
Take 3 intentionally flaky tests (timing-dependent, order-dependent, and shared-state) and fix each one. Document the root cause and the fix.
Exercise 2: Build a Quarantine System
- Create a
@Quarantinetag/label mechanism in your test framework - Configure CI to run quarantined tests separately
- Build a script that tracks flaky test history across runs
- Set up alerts when a test’s reliability drops below 99%
Exercise 3: Prevention Pipeline
- Add repeat-mode testing for new tests in your PR pipeline
- Configure random test ordering
- Set up a flakiness dashboard tracking reliability per test
- Create a team policy document for handling flaky tests