Dealing with Flaky Tests

Master flaky test management: root causes, synchronization fixes, quarantine systems, detection strategies, and prevention best practices.

What Are Flaky Tests?

A flaky test is a test that passes and fails intermittently without any code changes. You run the test suite — it passes. You run it again on the same code — a test fails. You run it a third time — it passes again. This non-deterministic behavior destroys trust in the test suite and wastes enormous amounts of developer time investigating false failures.

Industry data shows that flaky tests are the number one complaint of development teams about test automation. Google reported that 1.5% of their tests were flaky, and those tests consumed 2-16% of their entire compute resources through retries. At scale, even a small flaky test percentage has massive impact.

Root Causes of Flakiness

1. Timing and Synchronization Issues (Most Common)

The test interacts with the UI before an element is ready, or checks a condition before an asynchronous operation completes.

Bad pattern:

await page.click('#submit');
// Element might not exist yet!
const message = await page.textContent('.success-message');
expect(message).toBe('Order placed');

Fixed pattern:

await page.click('#submit');
await expect(page.locator('.success-message')).toHaveText('Order placed');
// Playwright automatically waits for the element and retries

2. Test Order Dependencies

Tests that depend on other tests running first (shared state, data created by previous tests).

3. Shared Mutable State

Tests that modify global state (database, files, environment variables) without proper isolation.

4. External Service Dependencies

Tests that call real external services which may be slow, rate-limited, or occasionally unavailable.

5. Resource Contention

Tests competing for limited resources (ports, file handles, database connections) during parallel execution.

6. Time-Dependent Logic

Tests that depend on the current time, day of week, or timezone.

Fixing Flaky Tests

Replace Sleep with Explicit Waits

// BAD — hardcoded sleep
await page.click('#submit');
await page.waitForTimeout(3000);
expect(await page.textContent('.result')).toBe('Success');

// GOOD — wait for condition
await page.click('#submit');
await expect(page.locator('.result')).toHaveText('Success', { timeout: 10000 });

Ensure Test Isolation

@BeforeEach
void isolateTest() {
    database.beginTransaction();
    // Each test gets a clean state
}

@AfterEach
void cleanupTest() {
    database.rollbackTransaction();
    // All changes are undone
}

Mock External Services

await page.route('**/api/external-service/**', route => {
    route.fulfill({
        status: 200,
        body: JSON.stringify({ result: 'mocked response' })
    });
});

Flaky Test Detection Systems

Repeat Mode in PR Pipeline

Run new or modified tests multiple times before merging:

# GitHub Actions example
- name: Run new tests 20 times
  run: |
    for i in {1..20}; do
      npx playwright test --grep @new
    done

Flakiness Tracking Dashboard

Track each test’s pass/fail history over time:

Test: testCheckoutFlow
  Last 100 runs: 96 pass, 4 fail (96% reliability)
  Status: FLAKY (below 99% threshold)
  Last failure: 2024-01-15 — TimeoutError on .payment-confirmation
  Assigned to: @developer-alice

Automatic Flaky Detection

# Analyze test results across CI runs
def detect_flaky_tests(results_last_30_days):
    for test in results:
        pass_rate = test.passes / (test.passes + test.failures)
        if 0.5 < pass_rate < 0.99:
            mark_as_flaky(test)
            notify_team(test)

Quarantine System

When a test is identified as flaky:

Mark it: Add a @Flaky tag or move to a quarantine suite
Isolate it: Remove from the blocking CI pipeline
Monitor it: Continue running in a separate non-blocking job
Fix it: Assign an owner and set a deadline
Restore it: Once fixed and stable for N runs, move back to the main suite

@Tag("quarantine")
@Flaky(reason = "Intermittent timeout on slow CI runners", ticket = "BUG-123")
@Test
void testCheckoutWithCoupon() {
    // This test is quarantined - it runs but does not block deployment
}

Prevention Best Practices

Never use hardcoded waits — always use explicit conditions
Run tests in random order — catches order dependencies early
Repeat new tests — run 20-50 times before merging
Mock external services — eliminate network variability
Use unique test data — avoid conflicts between parallel tests
Set realistic timeouts — long enough for slow CI, short enough to fail fast
Review flaky metrics weekly — make flakiness visible to the team

Exercises

Exercise 1: Diagnose and Fix

Take 3 intentionally flaky tests (timing-dependent, order-dependent, and shared-state) and fix each one. Document the root cause and the fix.

Exercise 2: Build a Quarantine System

Create a @Quarantine tag/label mechanism in your test framework
Configure CI to run quarantined tests separately
Build a script that tracks flaky test history across runs
Set up alerts when a test’s reliability drops below 99%

Exercise 3: Prevention Pipeline

Add repeat-mode testing for new tests in your PR pipeline
Configure random test ordering
Set up a flakiness dashboard tracking reliability per test
Create a team policy document for handling flaky tests

Dealing with Flaky Tests

What You Will Learn

What Are Flaky Tests?

Root Causes of Flakiness

1. Timing and Synchronization Issues (Most Common)

2. Test Order Dependencies

3. Shared Mutable State

4. External Service Dependencies

5. Resource Contention

6. Time-Dependent Logic

Fixing Flaky Tests

Replace Sleep with Explicit Waits

Ensure Test Isolation

Mock External Services

Flaky Test Detection Systems

Repeat Mode in PR Pipeline

Flakiness Tracking Dashboard

Automatic Flaky Detection

Quarantine System

Prevention Best Practices

Exercises

Exercise 1: Diagnose and Fix

Exercise 2: Build a Quarantine System

Exercise 3: Prevention Pipeline

Knowledge Check

Dealing with Flaky Tests

What You Will Learn

What Are Flaky Tests? #

Root Causes of Flakiness #

1. Timing and Synchronization Issues (Most Common) #

2. Test Order Dependencies #

3. Shared Mutable State #

4. External Service Dependencies #

5. Resource Contention #

6. Time-Dependent Logic #

Fixing Flaky Tests #

Replace Sleep with Explicit Waits #

Ensure Test Isolation #

Mock External Services #

Flaky Test Detection Systems #

Repeat Mode in PR Pipeline #

Flakiness Tracking Dashboard #

Automatic Flaky Detection #

Quarantine System #

Prevention Best Practices #

Exercises #

Exercise 1: Diagnose and Fix #

Exercise 2: Build a Quarantine System #

Exercise 3: Prevention Pipeline #

Knowledge Check

What Are Flaky Tests?

Root Causes of Flakiness

1. Timing and Synchronization Issues (Most Common)

2. Test Order Dependencies

3. Shared Mutable State

4. External Service Dependencies

5. Resource Contention

6. Time-Dependent Logic

Fixing Flaky Tests

Replace Sleep with Explicit Waits

Ensure Test Isolation

Mock External Services

Flaky Test Detection Systems

Repeat Mode in PR Pipeline

Flakiness Tracking Dashboard

Automatic Flaky Detection

Quarantine System

Prevention Best Practices

Exercises

Exercise 1: Diagnose and Fix

Exercise 2: Build a Quarantine System

Exercise 3: Prevention Pipeline