Beyond Standard Load Testing

In the previous lessons, you learned how to use tools like JMeter, k6, Gatling, and Locust to create load tests. But choosing the right type of performance test is just as important as choosing the right tool. Different performance test types answer different questions about your system.

This lesson covers four specialized performance testing types that go beyond standard load testing. Each one has a distinct purpose, load profile, and set of defects it uncovers.

Stress Testing: Finding the Breaking Point

Definition: Stress testing pushes the system beyond its designed capacity to identify the breaking point and observe how it fails.

Key question: “At what point does the system break, and how does it fail?”

Load Profile

Stress testing gradually increases the load until the system degrades or crashes:

graph LR subgraph Stress Test Load Profile A[Start: Normal Load] --> B[Increase Beyond Capacity] B --> C[System Degradation] C --> D[Breaking Point] D --> E[Recovery Phase] end

A typical stress test ramps up well beyond the expected maximum load:

PhaseDurationLoad Level
Warm-up5 min50% of expected max
Normal load10 min100% of expected max
Stress phase 110 min150% of expected max
Stress phase 210 min200% of expected max
Extreme stress10 min300% of expected max
Recovery10 minBack to 0 or normal

What to Look For

  • At what user count or request rate do errors start appearing?
  • Does the system degrade gracefully (slower responses) or fail catastrophically (complete crash, data corruption)?
  • Does the system recover when load returns to normal, or does it require a restart?
  • Are error messages meaningful? Users should see friendly error pages, not stack traces.
  • Are there cascading failures? Does one component failing bring down others?

Common Defects Found

  • Unresponsive endpoints under extreme load
  • Out of memory errors (OOM)
  • Thread pool exhaustion
  • Database connection limits exceeded
  • Load balancer misconfiguration
  • Missing circuit breakers or rate limiting
  • Data corruption under concurrent writes

Endurance (Soak) Testing: The Long Run

Definition: Endurance testing (also called soak testing) applies a moderate, steady load over an extended period — typically hours or even days.

Key question: “Does the system remain stable and performant over long periods of normal use?”

Load Profile

graph LR subgraph Endurance Test Load Profile A[Ramp Up
15 min] --> B[Steady Load at 70-80% Capacity
4-24 hours] B --> C[Ramp Down
15 min] end
PhaseDurationLoad Level
Ramp-up15-30 min0% to 70-80% capacity
Steady state4-72 hours70-80% of expected max
Ramp-down15-30 minBack to 0

The load level should represent typical production usage, not extreme conditions. The point is to run long enough for time-dependent defects to surface.

What to Look For

  • Memory usage over time: Does it trend upward? A slow memory leak that adds 1MB per hour will crash the system after days.
  • Response time trends: Do response times gradually increase even though the load stays constant?
  • Database connection pool: Are connections being properly returned, or is the pool slowly depleting?
  • Disk space: Are logs, temp files, or cache growing unbounded?
  • External service handles: Are file descriptors, network sockets, or API rate limits being consumed without release?

Common Defects Found

  • Memory leaks (objects not garbage collected)
  • Database connection pool exhaustion
  • Log file disk space consumption
  • Gradual performance degradation
  • Session management issues (stale sessions accumulating)
  • Cache that grows without eviction
  • File descriptor leaks

Spike Testing: Sudden Burst

Definition: Spike testing simulates a sudden, massive increase in load followed by either sustained high load or an equally sudden decrease.

Key question: “Can the system handle a sudden burst of users and recover quickly?”

Load Profile

graph LR subgraph Spike Test Load Profile A[Normal Load
5 min] --> B[Instant Spike
10x users] B --> C[Sustained Spike
5-10 min] C --> D[Drop to Normal
instant] D --> E[Recovery
5 min] end
PhaseDurationLoad Level
Baseline5 minNormal load (100 users)
Spike upInstant (seconds)10x normal (1000 users)
Sustained spike5-10 min10x normal
Spike downInstant (seconds)Back to normal (100 users)
Recovery monitoring5-10 minNormal load

Real-World Scenarios for Spike Testing

  • Flash sales: An e-commerce site announces a 90% off sale at noon
  • Breaking news: A news site gets linked from social media
  • Game launches: Server opens for a new online game
  • Marketing campaigns: Email blast sends millions of users to a landing page
  • TV mentions: A product gets featured on a popular TV show

What to Look For

  • How fast does the system scale? Auto-scaling cloud infrastructure may need 2-5 minutes to provision new instances — what happens during that gap?
  • Does the system shed load gracefully? Queue overflow, rate limiting, or waiting rooms should activate.
  • Recovery time: How long after the spike ends does the system return to normal response times?
  • Data consistency: Were there lost transactions, duplicate orders, or corrupted data during the spike?
  • Error rates during spike: What percentage of users got errors vs. slow responses vs. successful responses?

Common Defects Found

  • Auto-scaling too slow to handle spikes
  • Queue overflow causing lost messages
  • Connection pool starvation
  • Thread pool saturation
  • Cache stampede (thundering herd problem)
  • Missing rate limiting or circuit breakers
  • Inconsistent state after recovery

Volume Testing: Large Data Sets

Definition: Volume testing evaluates system behavior when the database or storage contains very large amounts of data, regardless of concurrent users.

Key question: “Does the system perform well when the data volume is at production scale or beyond?”

What Makes It Different

Volume testing is not about concurrent users — it is about data size. You might run volume tests with just a few users but with millions or billions of records in the database.

AspectLoad TestingVolume Testing
FocusConcurrent usersData size
UsersMany (hundreds/thousands)Few (1-10)
DataNormal datasetVery large dataset
DurationMinutes to hoursMinutes to hours
MeasuresResponse time under loadResponse time with large data

What to Test

  • Database queries: Do queries that work with 1,000 records still perform with 10 million records?
  • Search functionality: Does full-text search degrade with a large index?
  • Pagination: Does page 10,000 load as fast as page 1?
  • Reports and aggregations: Can the system generate reports from millions of records?
  • Data import/export: How long does it take to import or export large datasets?
  • Backup and restore: Can backups complete within the maintenance window?
  • Storage limits: What happens when disk space is nearly full?

Common Defects Found

  • Missing database indexes causing full table scans
  • N+1 query problems becoming critical at scale
  • Pagination using OFFSET instead of cursor-based pagination
  • Reports timing out with large datasets
  • Backup processes exceeding maintenance windows
  • Search index performance degradation
  • File storage systems hitting inode or size limits

Exercise: Design Load Profiles for Each Type

You are the QA lead for an online ticket booking platform. The platform normally handles 500 concurrent users and stores 2 million booking records. Design load profiles for each of the four testing types.

Context

  • Normal peak: 500 concurrent users
  • Expected maximum: 800 concurrent users
  • Database: 2 million bookings, 500K user accounts
  • Special events: Concert ticket releases cause 10x spikes
  • System runs 24/7

Requirements

For each test type (stress, endurance, spike, volume), specify:

  1. The specific question you want to answer
  2. The load profile (stages, users, duration)
  3. Three key metrics to monitor
  4. Two potential defects you expect to find
Hint: Think About Real Scenarios
  • Stress: What happens when a popular concert goes on sale and users keep increasing beyond 800?
  • Endurance: The system runs 24/7 — what happens after a week of continuous operation?
  • Spike: A famous artist announces a surprise concert — 5000 users arrive in 30 seconds.
  • Volume: After 3 years of operation, the database has 20 million bookings. Do searches still work?
Solution: Complete Load Profile Designs

1. Stress Test

Question: “At what point does the booking system stop accepting new reservations?”

Load profile:

  • Warm-up: 5 min at 250 users
  • Normal: 10 min at 500 users
  • Stress 1: 10 min at 800 users (expected max)
  • Stress 2: 10 min at 1200 users (150% of max)
  • Stress 3: 10 min at 2000 users (250% of max)
  • Extreme: 10 min at 3000 users
  • Recovery: 15 min back to 500 users

Key metrics: Error rate at each phase, response time degradation curve, system recovery time

Expected defects: Database connection pool exhausted at ~1500 users, booking confirmation emails queuing up and failing at high load

2. Endurance Test

Question: “Does the system maintain performance after 48 hours of continuous operation?”

Load profile:

  • Ramp-up: 30 min to 400 users (80% of normal peak)
  • Steady state: 48 hours at 400 users
  • Ramp-down: 30 min to 0

Key metrics: Memory consumption trend (hourly), response time p95 trend (hourly), database connection pool utilization over time

Expected defects: Session objects not cleaned up for abandoned bookings (memory leak), log rotation misconfigured causing disk fill after 24 hours

3. Spike Test

Question: “Can the system handle a sudden rush when a popular concert goes on sale?”

Load profile:

  • Baseline: 5 min at 500 users
  • Spike: Instantly jump to 5000 users (10x)
  • Sustained: 10 min at 5000 users
  • Drop: Instantly back to 500 users
  • Recovery: 10 min at 500 users

Key metrics: Error rate during the first 60 seconds of spike, time to first successful booking during spike, recovery time to normal p95 response time

Expected defects: Seat reservation locks causing deadlocks under sudden concurrency, payment gateway timeout errors during spike (third-party cannot handle burst)

4. Volume Test

Question: “Will search and reporting work when the database reaches 20 million bookings?”

Data setup:

  • Populate database with 20 million booking records (10x current)
  • 5 million user accounts
  • 10 years of historical data

Test with 5-10 concurrent users:

  • Search bookings by date range spanning 1 year
  • Generate monthly revenue reports
  • Export user booking history (users with 500+ bookings)
  • Load booking detail page for old records

Key metrics: Search query execution time, report generation time, booking detail page load time for old vs. new records

Expected defects: Booking search without date filter causes full table scan (missing composite index), monthly report aggregation exceeds 30-second timeout with 20M records

Pro Tips

  • Combine Types in Test Strategy: A complete performance test strategy includes all four types. Run load tests first (baseline), then stress, then endurance, and finally volume. Each type builds on insights from the previous.
  • Monitor Infrastructure, Not Just Application: During all performance tests, monitor CPU, memory, disk I/O, network, database connections, and queue depths — not just application response times.
  • Test Recovery Explicitly: After stress and spike tests, always include a recovery phase. A system that crashes under stress but auto-recovers in 30 seconds is very different from one that requires manual intervention.
  • Volume Testing Data Generation: Use tools like Faker (Python), DataFactory (Java), or custom scripts to generate realistic test data. Synthetic data should have the same statistical distribution as production data.
  • Set Clear Acceptance Criteria: Before running any performance test, define what “pass” and “fail” mean. For example: “p95 response time must stay under 2 seconds throughout the 48-hour endurance test.”