Why API Performance Testing Matters
Every modern application depends on APIs. When an API slows down, the entire user experience degrades — pages take longer to load, mobile apps freeze, and integrations time out. API performance testing ensures your endpoints can handle expected and peak traffic without degrading the user experience.
Unlike UI performance testing, API performance testing isolates the backend. You remove browser rendering, network variability, and frontend code from the equation. This gives you precise measurements of how your server processes requests.
Key Performance Metrics
Before writing a single test, you need to understand what you are measuring.
Throughput (Requests Per Second)
Throughput measures how many requests your API can handle per second. A REST endpoint returning user profiles might sustain 5,000 RPS on a well-tuned server. If your expected peak traffic is 2,000 RPS, you have comfortable headroom.
Latency Percentiles
Average latency is misleading. If 95% of requests take 50ms but 5% take 3 seconds, the average might be 200ms — which hides the terrible experience for that 5%.
Use percentiles instead:
| Percentile | Meaning |
|---|---|
| p50 (median) | Half of requests are faster than this |
| p90 | 90% of requests complete within this time |
| p95 | The threshold most SLAs target |
| p99 | Captures near-worst-case experience |
Error Rate
The percentage of requests returning 5xx errors or timing out under load. A healthy API maintains less than 0.1% error rate at expected load. As load increases beyond capacity, the error rate spikes — this inflection point reveals your true capacity.
Concurrency
The number of simultaneous connections your API handles. This differs from throughput: an API might handle 1,000 RPS with 50 concurrent connections (fast responses) or 1,000 RPS with 500 concurrent connections (slow responses).
Types of API Performance Tests
Load Test
Simulates expected production traffic. Validates that performance meets SLAs under normal conditions.
// k6 load test example
export const options = {
stages: [
{ duration: '2m', target: 100 }, // ramp up
{ duration: '5m', target: 100 }, // steady state
{ duration: '2m', target: 0 }, // ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'],
http_req_failed: ['rate<0.01'],
},
};
Stress Test
Pushes beyond expected load to find breaking points. Keeps increasing virtual users until the API degrades or fails.
Spike Test
Simulates sudden traffic surges — like a flash sale or viral social media post. Tests how the API handles instantaneous jumps in traffic and recovers afterward.
Soak Test
Runs moderate load for hours (4-12+). Reveals memory leaks, connection pool exhaustion, log file growth, and other issues that only appear over time.
Setting Up k6 for API Testing
k6 is ideal for API performance testing because it is lightweight, scriptable in JavaScript, and provides detailed metrics out of the box.
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
vus: 50,
duration: '5m',
thresholds: {
http_req_duration: ['p(95)<300', 'p(99)<1000'],
http_req_failed: ['rate<0.01'],
},
};
export default function () {
const res = http.get('https://api.example.com/users');
check(res, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
'body contains data': (r) => r.json().data !== undefined,
});
sleep(1);
}
Multi-Endpoint Scenarios
Real-world traffic hits multiple endpoints. Model this by weighting requests proportionally to production traffic patterns.
import http from 'k6/http';
import { group, sleep } from 'k6';
export default function () {
group('Browse Products', () => {
http.get('https://api.example.com/products');
sleep(0.5);
});
group('View Product Detail', () => {
const id = Math.floor(Math.random() * 100) + 1;
http.get(`https://api.example.com/products/${id}`);
sleep(0.3);
});
group('Search', () => {
http.get('https://api.example.com/search?q=widget');
sleep(0.5);
});
}
Interpreting Results
After a k6 run, you get a summary like this:
http_req_duration..........: avg=120ms min=15ms med=95ms max=4200ms p(90)=250ms p(95)=380ms
http_req_failed............: 0.23% ✓ 46 ✗ 19954
http_reqs..................: 20000 666.5/s
What to look for:
- p95 vs threshold — Is 380ms within your 500ms SLA? Yes, you pass.
- p95 vs average gap — 380ms vs 120ms average. The tail latency is 3x the average, suggesting some requests hit slow paths.
- Error rate — 0.23% might seem low, but if your SLA requires < 0.1%, you fail.
- Max latency — 4,200ms means at least one request took over 4 seconds. Investigate why.
Exercise: Complete API Performance Test Suite
Build a comprehensive performance test for a REST API with these requirements:
Setup
Use the public JSONPlaceholder API (https://jsonplaceholder.typicode.com) or set up a local API.
Part 1: Baseline Test
Create a k6 script that establishes baseline performance:
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate, Counter } from 'k6/metrics';
// Custom metrics
const listLatency = new Trend('list_latency');
const detailLatency = new Trend('detail_latency');
const createLatency = new Trend('create_latency');
const errorRate = new Rate('errors');
const requestCount = new Counter('total_requests');
export const options = {
scenarios: {
baseline: {
executor: 'constant-vus',
vus: 10,
duration: '2m',
},
},
thresholds: {
list_latency: ['p(95)<500'],
detail_latency: ['p(95)<300'],
create_latency: ['p(95)<800'],
errors: ['rate<0.01'],
},
};
export default function () {
// GET /posts (list)
let res = http.get('https://jsonplaceholder.typicode.com/posts');
listLatency.add(res.timings.duration);
check(res, { 'list 200': (r) => r.status === 200 }) || errorRate.add(1);
requestCount.add(1);
sleep(0.5);
// GET /posts/:id (detail)
const id = Math.floor(Math.random() * 100) + 1;
res = http.get(`https://jsonplaceholder.typicode.com/posts/${id}`);
detailLatency.add(res.timings.duration);
check(res, { 'detail 200': (r) => r.status === 200 }) || errorRate.add(1);
requestCount.add(1);
sleep(0.3);
// POST /posts (create)
res = http.post('https://jsonplaceholder.typicode.com/posts',
JSON.stringify({ title: 'Test', body: 'Content', userId: 1 }),
{ headers: { 'Content-Type': 'application/json' } }
);
createLatency.add(res.timings.duration);
check(res, { 'create 201': (r) => r.status === 201 }) || errorRate.add(1);
requestCount.add(1);
sleep(0.5);
}
Part 2: Load Test with Ramping
Extend the baseline to simulate production-like traffic:
export const options = {
scenarios: {
load_test: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '2m', target: 50 },
{ duration: '5m', target: 50 },
{ duration: '1m', target: 100 },
{ duration: '3m', target: 100 },
{ duration: '2m', target: 0 },
],
},
},
thresholds: {
http_req_duration: ['p(95)<1000'],
http_req_failed: ['rate<0.05'],
},
};
Part 3: Spike Test
Simulate a sudden surge in traffic:
export const options = {
scenarios: {
spike: {
executor: 'ramping-vus',
startVUs: 0,
stages: [
{ duration: '1m', target: 20 }, // warm up
{ duration: '10s', target: 200 }, // spike
{ duration: '2m', target: 200 }, // hold spike
{ duration: '10s', target: 20 }, // drop
{ duration: '2m', target: 20 }, // recovery
{ duration: '30s', target: 0 }, // ramp down
],
},
},
};
Analysis Tasks
After running each test:
- Record the metrics table — Copy p50, p90, p95, p99, and max latency for each endpoint.
- Compare baseline to load — How much did p95 increase when you added more virtual users?
- Analyze the spike — Did the API maintain acceptable latency during the spike? How long did it take to return to baseline after the spike dropped?
- Identify the bottleneck — Which endpoint degraded fastest under load? Why might that be? (Hint: list endpoints typically query more data than detail endpoints.)
Expected Observations
- List endpoints (
GET /posts) will likely show higher latency than detail endpoints (GET /posts/:id) because they return larger payloads. - POST requests typically have higher latency because they involve write operations.
- During the spike test, you should observe latency increase, then stabilize or degrade further if the API cannot handle the load.
- After the spike drops, latency should return to near-baseline levels — if it does not, the API has a recovery problem.
Performance Testing Checklist
Use this checklist before declaring an API ready for production:
- Baseline latency recorded for all critical endpoints
- Load test passes SLA thresholds at expected peak traffic
- Stress test identifies the breaking point (what RPS causes degradation)
- Spike test confirms recovery within acceptable time
- Soak test (4+ hours) shows no memory leaks or gradual degradation
- Error rate stays below SLA threshold at expected load
- Database connection pools do not exhaust under load
- Third-party API calls have timeouts and circuit breakers