Stress, Endurance, Spike, and Volume Testing

Learn the differences between stress, endurance (soak), spike, and volume testing. Understand load profiles, expected results, and when to use each type.

Beyond Standard Load Testing

In the previous lessons, you learned how to use tools like JMeter, k6, Gatling, and Locust to create load tests. But choosing the right type of performance test is just as important as choosing the right tool. Different performance test types answer different questions about your system.

This lesson covers four specialized performance testing types that go beyond standard load testing. Each one has a distinct purpose, load profile, and set of defects it uncovers.

Stress Testing: Finding the Breaking Point

Definition: Stress testing pushes the system beyond its designed capacity to identify the breaking point and observe how it fails.

Key question: “At what point does the system break, and how does it fail?”

Load Profile

Stress testing gradually increases the load until the system degrades or crashes:

graph LR subgraph Stress Test Load Profile A[Start: Normal Load] --> B[Increase Beyond Capacity] B --> C[System Degradation] C --> D[Breaking Point] D --> E[Recovery Phase] end

A typical stress test ramps up well beyond the expected maximum load:

Phase	Duration	Load Level
Warm-up	5 min	50% of expected max
Normal load	10 min	100% of expected max
Stress phase 1	10 min	150% of expected max
Stress phase 2	10 min	200% of expected max
Extreme stress	10 min	300% of expected max
Recovery	10 min	Back to 0 or normal

What to Look For

At what user count or request rate do errors start appearing?
Does the system degrade gracefully (slower responses) or fail catastrophically (complete crash, data corruption)?
Does the system recover when load returns to normal, or does it require a restart?
Are error messages meaningful? Users should see friendly error pages, not stack traces.
Are there cascading failures? Does one component failing bring down others?

Common Defects Found

Unresponsive endpoints under extreme load
Out of memory errors (OOM)
Thread pool exhaustion
Database connection limits exceeded
Load balancer misconfiguration
Missing circuit breakers or rate limiting
Data corruption under concurrent writes

Endurance (Soak) Testing: The Long Run

Definition: Endurance testing (also called soak testing) applies a moderate, steady load over an extended period — typically hours or even days.

Key question: “Does the system remain stable and performant over long periods of normal use?”

Load Profile

graph LR subgraph Endurance Test Load Profile A[Ramp Up
15 min] --> B[Steady Load at 70-80% Capacity
4-24 hours] B --> C[Ramp Down
15 min] end

Phase	Duration	Load Level
Ramp-up	15-30 min	0% to 70-80% capacity
Steady state	4-72 hours	70-80% of expected max
Ramp-down	15-30 min	Back to 0

The load level should represent typical production usage, not extreme conditions. The point is to run long enough for time-dependent defects to surface.

What to Look For

Memory usage over time: Does it trend upward? A slow memory leak that adds 1MB per hour will crash the system after days.
Response time trends: Do response times gradually increase even though the load stays constant?
Database connection pool: Are connections being properly returned, or is the pool slowly depleting?
Disk space: Are logs, temp files, or cache growing unbounded?
External service handles: Are file descriptors, network sockets, or API rate limits being consumed without release?

Common Defects Found

Memory leaks (objects not garbage collected)
Database connection pool exhaustion
Log file disk space consumption
Gradual performance degradation
Session management issues (stale sessions accumulating)
Cache that grows without eviction
File descriptor leaks

Spike Testing: Sudden Burst

Definition: Spike testing simulates a sudden, massive increase in load followed by either sustained high load or an equally sudden decrease.

Key question: “Can the system handle a sudden burst of users and recover quickly?”

Load Profile

graph LR subgraph Spike Test Load Profile A[Normal Load
5 min] --> B[Instant Spike
10x users] B --> C[Sustained Spike
5-10 min] C --> D[Drop to Normal
instant] D --> E[Recovery
5 min] end

Phase	Duration	Load Level
Baseline	5 min	Normal load (100 users)
Spike up	Instant (seconds)	10x normal (1000 users)
Sustained spike	5-10 min	10x normal
Spike down	Instant (seconds)	Back to normal (100 users)
Recovery monitoring	5-10 min	Normal load

Real-World Scenarios for Spike Testing

Flash sales: An e-commerce site announces a 90% off sale at noon
Breaking news: A news site gets linked from social media
Game launches: Server opens for a new online game
Marketing campaigns: Email blast sends millions of users to a landing page
TV mentions: A product gets featured on a popular TV show

What to Look For

How fast does the system scale? Auto-scaling cloud infrastructure may need 2-5 minutes to provision new instances — what happens during that gap?
Does the system shed load gracefully? Queue overflow, rate limiting, or waiting rooms should activate.
Recovery time: How long after the spike ends does the system return to normal response times?
Data consistency: Were there lost transactions, duplicate orders, or corrupted data during the spike?
Error rates during spike: What percentage of users got errors vs. slow responses vs. successful responses?

Common Defects Found

Auto-scaling too slow to handle spikes
Queue overflow causing lost messages
Connection pool starvation
Thread pool saturation
Cache stampede (thundering herd problem)
Missing rate limiting or circuit breakers
Inconsistent state after recovery

Volume Testing: Large Data Sets

Definition: Volume testing evaluates system behavior when the database or storage contains very large amounts of data, regardless of concurrent users.

Key question: “Does the system perform well when the data volume is at production scale or beyond?”

What Makes It Different

Volume testing is not about concurrent users — it is about data size. You might run volume tests with just a few users but with millions or billions of records in the database.

Aspect	Load Testing	Volume Testing
Focus	Concurrent users	Data size
Users	Many (hundreds/thousands)	Few (1-10)
Data	Normal dataset	Very large dataset
Duration	Minutes to hours	Minutes to hours
Measures	Response time under load	Response time with large data

What to Test

Database queries: Do queries that work with 1,000 records still perform with 10 million records?
Search functionality: Does full-text search degrade with a large index?
Pagination: Does page 10,000 load as fast as page 1?
Reports and aggregations: Can the system generate reports from millions of records?
Data import/export: How long does it take to import or export large datasets?
Backup and restore: Can backups complete within the maintenance window?
Storage limits: What happens when disk space is nearly full?

Common Defects Found

Missing database indexes causing full table scans
N+1 query problems becoming critical at scale
Pagination using OFFSET instead of cursor-based pagination
Reports timing out with large datasets
Backup processes exceeding maintenance windows
Search index performance degradation
File storage systems hitting inode or size limits

Exercise: Design Load Profiles for Each Type

You are the QA lead for an online ticket booking platform. The platform normally handles 500 concurrent users and stores 2 million booking records. Design load profiles for each of the four testing types.

Context

Normal peak: 500 concurrent users
Expected maximum: 800 concurrent users
Database: 2 million bookings, 500K user accounts
Special events: Concert ticket releases cause 10x spikes
System runs 24/7

Requirements

For each test type (stress, endurance, spike, volume), specify:

The specific question you want to answer
The load profile (stages, users, duration)
Three key metrics to monitor
Two potential defects you expect to find

Hint: Think About Real Scenarios

Stress: What happens when a popular concert goes on sale and users keep increasing beyond 800?
Endurance: The system runs 24/7 — what happens after a week of continuous operation?
Spike: A famous artist announces a surprise concert — 5000 users arrive in 30 seconds.
Volume: After 3 years of operation, the database has 20 million bookings. Do searches still work?

Solution: Complete Load Profile Designs

1. Stress Test

Question: “At what point does the booking system stop accepting new reservations?”

Load profile:

Warm-up: 5 min at 250 users
Normal: 10 min at 500 users
Stress 1: 10 min at 800 users (expected max)
Stress 2: 10 min at 1200 users (150% of max)
Stress 3: 10 min at 2000 users (250% of max)
Extreme: 10 min at 3000 users
Recovery: 15 min back to 500 users

Key metrics: Error rate at each phase, response time degradation curve, system recovery time

Expected defects: Database connection pool exhausted at ~1500 users, booking confirmation emails queuing up and failing at high load

2. Endurance Test

Question: “Does the system maintain performance after 48 hours of continuous operation?”

Load profile:

Ramp-up: 30 min to 400 users (80% of normal peak)
Steady state: 48 hours at 400 users
Ramp-down: 30 min to 0

Key metrics: Memory consumption trend (hourly), response time p95 trend (hourly), database connection pool utilization over time

Expected defects: Session objects not cleaned up for abandoned bookings (memory leak), log rotation misconfigured causing disk fill after 24 hours

3. Spike Test

Question: “Can the system handle a sudden rush when a popular concert goes on sale?”

Load profile:

Baseline: 5 min at 500 users
Spike: Instantly jump to 5000 users (10x)
Sustained: 10 min at 5000 users
Drop: Instantly back to 500 users
Recovery: 10 min at 500 users

Key metrics: Error rate during the first 60 seconds of spike, time to first successful booking during spike, recovery time to normal p95 response time

Expected defects: Seat reservation locks causing deadlocks under sudden concurrency, payment gateway timeout errors during spike (third-party cannot handle burst)

4. Volume Test

Question: “Will search and reporting work when the database reaches 20 million bookings?”

Data setup:

Populate database with 20 million booking records (10x current)
5 million user accounts
10 years of historical data

Test with 5-10 concurrent users:

Search bookings by date range spanning 1 year
Generate monthly revenue reports
Export user booking history (users with 500+ bookings)
Load booking detail page for old records

Key metrics: Search query execution time, report generation time, booking detail page load time for old vs. new records

Expected defects: Booking search without date filter causes full table scan (missing composite index), monthly report aggregation exceeds 30-second timeout with 20M records

Pro Tips

Combine Types in Test Strategy: A complete performance test strategy includes all four types. Run load tests first (baseline), then stress, then endurance, and finally volume. Each type builds on insights from the previous.
Monitor Infrastructure, Not Just Application: During all performance tests, monitor CPU, memory, disk I/O, network, database connections, and queue depths — not just application response times.
Test Recovery Explicitly: After stress and spike tests, always include a recovery phase. A system that crashes under stress but auto-recovers in 30 seconds is very different from one that requires manual intervention.
Volume Testing Data Generation: Use tools like Faker (Python), DataFactory (Java), or custom scripts to generate realistic test data. Synthetic data should have the same statistical distribution as production data.
Set Clear Acceptance Criteria: Before running any performance test, define what “pass” and “fail” mean. For example: “p95 response time must stay under 2 seconds throughout the 48-hour endurance test.”

Stress, Endurance, Spike, and Volume Testing

What You Will Learn

Beyond Standard Load Testing

Stress Testing: Finding the Breaking Point

Load Profile

What to Look For

Common Defects Found

Endurance (Soak) Testing: The Long Run

Load Profile

What to Look For

Common Defects Found

Spike Testing: Sudden Burst

Load Profile

Real-World Scenarios for Spike Testing

What to Look For

Common Defects Found

Volume Testing: Large Data Sets

What Makes It Different

What to Test

Common Defects Found

Exercise: Design Load Profiles for Each Type

Context

Requirements

Pro Tips

Knowledge Check

Stress, Endurance, Spike, and Volume Testing

What You Will Learn

Beyond Standard Load Testing #

Stress Testing: Finding the Breaking Point #

Load Profile #

What to Look For #

Common Defects Found #

Endurance (Soak) Testing: The Long Run #

Load Profile #

What to Look For #

Common Defects Found #

Spike Testing: Sudden Burst #

Load Profile #

Real-World Scenarios for Spike Testing #

What to Look For #

Common Defects Found #

Volume Testing: Large Data Sets #

What Makes It Different #

What to Test #

Common Defects Found #

Exercise: Design Load Profiles for Each Type #

Context #

Requirements #

Pro Tips #

Knowledge Check

Beyond Standard Load Testing

Stress Testing: Finding the Breaking Point

Load Profile

What to Look For

Common Defects Found

Endurance (Soak) Testing: The Long Run

Load Profile

What to Look For

Common Defects Found

Spike Testing: Sudden Burst

Load Profile

Real-World Scenarios for Spike Testing

What to Look For

Common Defects Found

Volume Testing: Large Data Sets

What Makes It Different

What to Test

Common Defects Found

Exercise: Design Load Profiles for Each Type

Context

Requirements

Pro Tips