What Is Rate Limiting?

Rate limiting controls how many requests a client can make to an API within a given time window. It protects servers from abuse, ensures fair usage, and prevents denial-of-service attacks. As a tester, you need to verify that rate limits are correctly implemented and that the API communicates limits clearly.

Why Rate Limiting Matters

Without rate limiting, a single client could overwhelm the server. Real-world scenarios include:

  • A bug in a mobile app sending requests in an infinite loop
  • A malicious user trying to scrape all data
  • A misconfigured integration making thousands of calls per second
  • Brute-force attacks on authentication endpoints

Rate Limiting Algorithms

Fixed Window

Counts requests within fixed time intervals (e.g., per minute starting at :00). Simple to implement but allows bursts at window boundaries — a client could send 100 requests at 12:00:59 and 100 more at 12:01:00.

Sliding Window

Tracks requests over a rolling time window. More accurate than fixed window — if you sent 80 requests in the last 60 seconds, you have 20 remaining regardless of clock boundaries.

Token Bucket

Tokens are added to a bucket at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, allowing bursts up to that size.

Leaky Bucket

Requests enter a queue (bucket) and are processed at a constant rate. If the bucket overflows, new requests are rejected. This smooths traffic into a steady stream.

AlgorithmBurst HandlingAccuracyComplexity
Fixed WindowAllows boundary burstsLowLow
Sliding WindowPrevents burstsHighMedium
Token BucketAllows controlled burstsHighMedium
Leaky BucketNo burstsHighMedium

Rate Limit Headers

Most APIs communicate rate limits through response headers:

HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1625097600
Retry-After: 30
HeaderMeaning
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying (on 429)

Testing Rate Limit Headers

For every successful response, verify:

  1. X-RateLimit-Limit is present and matches documented limits
  2. X-RateLimit-Remaining decrements by 1 with each request
  3. X-RateLimit-Reset is a valid future timestamp
  4. Values are consistent across sequential requests

Testing Rate Limit Enforcement

Basic Enforcement Test

Send requests in rapid succession and verify:

import requests
import time

url = "https://api.example.com/data"
headers = {"Authorization": "Bearer token"}
results = []

for i in range(110):  # Exceed the 100/min limit
    response = requests.get(url, headers=headers)
    results.append({
        "request": i + 1,
        "status": response.status_code,
        "remaining": response.headers.get("X-RateLimit-Remaining")
    })

# Verify: first 100 should be 200, rest should be 429

Test Scenarios

ScenarioExpected Behavior
Normal usage within limits200 with correct remaining count
Exactly at the limit200 for last allowed request
One over the limit429 with Retry-After header
After waiting for reset200 with full limit restored
Different endpointsMay have separate limits
Different auth tokensEach user has own limits
No authenticationTypically stricter IP-based limits

Rate Limit Recovery Test

After hitting the limit:

  1. Verify 429 response includes Retry-After
  2. Wait the specified duration
  3. Send another request — should succeed with 200
  4. Verify X-RateLimit-Remaining is reset

Per-Endpoint vs. Global Limits

Some APIs have different limits per endpoint:

  • Authentication: 5 requests/minute (stricter to prevent brute force)
  • Read operations: 1000 requests/minute
  • Write operations: 100 requests/minute
  • Search: 30 requests/minute

Test that limits are applied per endpoint and do not bleed across different routes.

Distributed Rate Limiting

In microservices architectures, verify:

  • Limits are shared across multiple API gateway instances
  • Switching between servers does not reset the counter
  • Load balancer routing does not affect rate limit accuracy

Common Rate Limiting Bugs

BugHow to Detect
Limits not enforcedSend more than the limit — all return 200
Wrong remaining countTrack X-RateLimit-Remaining across requests
Reset time wrongCheck if reset timestamp matches actual behavior
No Retry-After on 429Inspect 429 responses for the header
Limits reset on errorCause a 400 error, check if limit counter resets
Different limits per methodGET and POST on same endpoint may have different limits

Hands-On Exercise

  1. Test GitHub API limits: GitHub allows 60 requests/hour unauthenticated. Send requests to https://api.github.com/users and track the rate limit headers.
  2. Measure the window: Determine whether the rate limiter uses fixed or sliding windows by sending bursts at window boundaries.
  3. Recovery test: Hit the rate limit, wait for Retry-After duration, and verify recovery.
  4. Document limits: Create a table of all rate limits for a test API, including per-endpoint and per-user limits.

Key Takeaways

  • Rate limiting protects APIs from abuse — testing it is critical for production readiness
  • Common algorithms include fixed window, sliding window, token bucket, and leaky bucket — each has different burst behavior
  • Always verify rate limit headers (Limit, Remaining, Reset) are accurate and consistent
  • Test the complete cycle: normal usage, hitting the limit, receiving 429 with Retry-After, and recovery
  • Per-endpoint, per-user, and per-IP limits may differ — test each independently