Why API Performance Testing Matters

Every modern application depends on APIs. When an API slows down, the entire user experience degrades — pages take longer to load, mobile apps freeze, and integrations time out. API performance testing ensures your endpoints can handle expected and peak traffic without degrading the user experience.

Unlike UI performance testing, API performance testing isolates the backend. You remove browser rendering, network variability, and frontend code from the equation. This gives you precise measurements of how your server processes requests.

Key Performance Metrics

Before writing a single test, you need to understand what you are measuring.

Throughput (Requests Per Second)

Throughput measures how many requests your API can handle per second. A REST endpoint returning user profiles might sustain 5,000 RPS on a well-tuned server. If your expected peak traffic is 2,000 RPS, you have comfortable headroom.

Latency Percentiles

Average latency is misleading. If 95% of requests take 50ms but 5% take 3 seconds, the average might be 200ms — which hides the terrible experience for that 5%.

Use percentiles instead:

PercentileMeaning
p50 (median)Half of requests are faster than this
p9090% of requests complete within this time
p95The threshold most SLAs target
p99Captures near-worst-case experience

Error Rate

The percentage of requests returning 5xx errors or timing out under load. A healthy API maintains less than 0.1% error rate at expected load. As load increases beyond capacity, the error rate spikes — this inflection point reveals your true capacity.

Concurrency

The number of simultaneous connections your API handles. This differs from throughput: an API might handle 1,000 RPS with 50 concurrent connections (fast responses) or 1,000 RPS with 500 concurrent connections (slow responses).

Types of API Performance Tests

Load Test

Simulates expected production traffic. Validates that performance meets SLAs under normal conditions.

// k6 load test example
export const options = {
  stages: [
    { duration: '2m', target: 100 },  // ramp up
    { duration: '5m', target: 100 },  // steady state
    { duration: '2m', target: 0 },    // ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500'],
    http_req_failed: ['rate<0.01'],
  },
};

Stress Test

Pushes beyond expected load to find breaking points. Keeps increasing virtual users until the API degrades or fails.

Spike Test

Simulates sudden traffic surges — like a flash sale or viral social media post. Tests how the API handles instantaneous jumps in traffic and recovers afterward.

Soak Test

Runs moderate load for hours (4-12+). Reveals memory leaks, connection pool exhaustion, log file growth, and other issues that only appear over time.

Setting Up k6 for API Testing

k6 is ideal for API performance testing because it is lightweight, scriptable in JavaScript, and provides detailed metrics out of the box.

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  vus: 50,
  duration: '5m',
  thresholds: {
    http_req_duration: ['p(95)<300', 'p(99)<1000'],
    http_req_failed: ['rate<0.01'],
  },
};

export default function () {
  const res = http.get('https://api.example.com/users');

  check(res, {
    'status is 200': (r) => r.status === 200,
    'response time < 500ms': (r) => r.timings.duration < 500,
    'body contains data': (r) => r.json().data !== undefined,
  });

  sleep(1);
}

Multi-Endpoint Scenarios

Real-world traffic hits multiple endpoints. Model this by weighting requests proportionally to production traffic patterns.

import http from 'k6/http';
import { group, sleep } from 'k6';

export default function () {
  group('Browse Products', () => {
    http.get('https://api.example.com/products');
    sleep(0.5);
  });

  group('View Product Detail', () => {
    const id = Math.floor(Math.random() * 100) + 1;
    http.get(`https://api.example.com/products/${id}`);
    sleep(0.3);
  });

  group('Search', () => {
    http.get('https://api.example.com/search?q=widget');
    sleep(0.5);
  });
}

Interpreting Results

After a k6 run, you get a summary like this:

http_req_duration..........: avg=120ms  min=15ms  med=95ms  max=4200ms  p(90)=250ms  p(95)=380ms
http_req_failed............: 0.23%  ✓ 46  ✗ 19954
http_reqs..................: 20000  666.5/s

What to look for:

  1. p95 vs threshold — Is 380ms within your 500ms SLA? Yes, you pass.
  2. p95 vs average gap — 380ms vs 120ms average. The tail latency is 3x the average, suggesting some requests hit slow paths.
  3. Error rate — 0.23% might seem low, but if your SLA requires < 0.1%, you fail.
  4. Max latency — 4,200ms means at least one request took over 4 seconds. Investigate why.

Exercise: Complete API Performance Test Suite

Build a comprehensive performance test for a REST API with these requirements:

Setup

Use the public JSONPlaceholder API (https://jsonplaceholder.typicode.com) or set up a local API.

Part 1: Baseline Test

Create a k6 script that establishes baseline performance:

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Trend, Rate, Counter } from 'k6/metrics';

// Custom metrics
const listLatency = new Trend('list_latency');
const detailLatency = new Trend('detail_latency');
const createLatency = new Trend('create_latency');
const errorRate = new Rate('errors');
const requestCount = new Counter('total_requests');

export const options = {
  scenarios: {
    baseline: {
      executor: 'constant-vus',
      vus: 10,
      duration: '2m',
    },
  },
  thresholds: {
    list_latency: ['p(95)<500'],
    detail_latency: ['p(95)<300'],
    create_latency: ['p(95)<800'],
    errors: ['rate<0.01'],
  },
};

export default function () {
  // GET /posts (list)
  let res = http.get('https://jsonplaceholder.typicode.com/posts');
  listLatency.add(res.timings.duration);
  check(res, { 'list 200': (r) => r.status === 200 }) || errorRate.add(1);
  requestCount.add(1);
  sleep(0.5);

  // GET /posts/:id (detail)
  const id = Math.floor(Math.random() * 100) + 1;
  res = http.get(`https://jsonplaceholder.typicode.com/posts/${id}`);
  detailLatency.add(res.timings.duration);
  check(res, { 'detail 200': (r) => r.status === 200 }) || errorRate.add(1);
  requestCount.add(1);
  sleep(0.3);

  // POST /posts (create)
  res = http.post('https://jsonplaceholder.typicode.com/posts',
    JSON.stringify({ title: 'Test', body: 'Content', userId: 1 }),
    { headers: { 'Content-Type': 'application/json' } }
  );
  createLatency.add(res.timings.duration);
  check(res, { 'create 201': (r) => r.status === 201 }) || errorRate.add(1);
  requestCount.add(1);
  sleep(0.5);
}

Part 2: Load Test with Ramping

Extend the baseline to simulate production-like traffic:

export const options = {
  scenarios: {
    load_test: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 50 },
        { duration: '5m', target: 50 },
        { duration: '1m', target: 100 },
        { duration: '3m', target: 100 },
        { duration: '2m', target: 0 },
      ],
    },
  },
  thresholds: {
    http_req_duration: ['p(95)<1000'],
    http_req_failed: ['rate<0.05'],
  },
};

Part 3: Spike Test

Simulate a sudden surge in traffic:

export const options = {
  scenarios: {
    spike: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '1m', target: 20 },   // warm up
        { duration: '10s', target: 200 },  // spike
        { duration: '2m', target: 200 },   // hold spike
        { duration: '10s', target: 20 },   // drop
        { duration: '2m', target: 20 },    // recovery
        { duration: '30s', target: 0 },    // ramp down
      ],
    },
  },
};

Analysis Tasks

After running each test:

  1. Record the metrics table — Copy p50, p90, p95, p99, and max latency for each endpoint.
  2. Compare baseline to load — How much did p95 increase when you added more virtual users?
  3. Analyze the spike — Did the API maintain acceptable latency during the spike? How long did it take to return to baseline after the spike dropped?
  4. Identify the bottleneck — Which endpoint degraded fastest under load? Why might that be? (Hint: list endpoints typically query more data than detail endpoints.)

Expected Observations

  • List endpoints (GET /posts) will likely show higher latency than detail endpoints (GET /posts/:id) because they return larger payloads.
  • POST requests typically have higher latency because they involve write operations.
  • During the spike test, you should observe latency increase, then stabilize or degrade further if the API cannot handle the load.
  • After the spike drops, latency should return to near-baseline levels — if it does not, the API has a recovery problem.

Performance Testing Checklist

Use this checklist before declaring an API ready for production:

  • Baseline latency recorded for all critical endpoints
  • Load test passes SLA thresholds at expected peak traffic
  • Stress test identifies the breaking point (what RPS causes degradation)
  • Spike test confirms recovery within acceptable time
  • Soak test (4+ hours) shows no memory leaks or gradual degradation
  • Error rate stays below SLA threshold at expected load
  • Database connection pools do not exhaust under load
  • Third-party API calls have timeouts and circuit breakers