What Is Rate Limiting?
Rate limiting controls how many requests a client can make to an API within a given time window. It protects servers from abuse, ensures fair usage, and prevents denial-of-service attacks. As a tester, you need to verify that rate limits are correctly implemented and that the API communicates limits clearly.
Why Rate Limiting Matters
Without rate limiting, a single client could overwhelm the server. Real-world scenarios include:
- A bug in a mobile app sending requests in an infinite loop
- A malicious user trying to scrape all data
- A misconfigured integration making thousands of calls per second
- Brute-force attacks on authentication endpoints
Rate Limiting Algorithms
Fixed Window
Counts requests within fixed time intervals (e.g., per minute starting at :00). Simple to implement but allows bursts at window boundaries — a client could send 100 requests at 12:00:59 and 100 more at 12:01:00.
Sliding Window
Tracks requests over a rolling time window. More accurate than fixed window — if you sent 80 requests in the last 60 seconds, you have 20 remaining regardless of clock boundaries.
Token Bucket
Tokens are added to a bucket at a fixed rate. Each request consumes one token. If the bucket is empty, the request is rejected. The bucket has a maximum capacity, allowing bursts up to that size.
Leaky Bucket
Requests enter a queue (bucket) and are processed at a constant rate. If the bucket overflows, new requests are rejected. This smooths traffic into a steady stream.
| Algorithm | Burst Handling | Accuracy | Complexity |
|---|---|---|---|
| Fixed Window | Allows boundary bursts | Low | Low |
| Sliding Window | Prevents bursts | High | Medium |
| Token Bucket | Allows controlled bursts | High | Medium |
| Leaky Bucket | No bursts | High | Medium |
Rate Limit Headers
Most APIs communicate rate limits through response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1625097600
Retry-After: 30
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Maximum requests allowed in the window |
X-RateLimit-Remaining | Requests remaining in current window |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait before retrying (on 429) |
Testing Rate Limit Headers
For every successful response, verify:
X-RateLimit-Limitis present and matches documented limitsX-RateLimit-Remainingdecrements by 1 with each requestX-RateLimit-Resetis a valid future timestamp- Values are consistent across sequential requests
Testing Rate Limit Enforcement
Basic Enforcement Test
Send requests in rapid succession and verify:
import requests
import time
url = "https://api.example.com/data"
headers = {"Authorization": "Bearer token"}
results = []
for i in range(110): # Exceed the 100/min limit
response = requests.get(url, headers=headers)
results.append({
"request": i + 1,
"status": response.status_code,
"remaining": response.headers.get("X-RateLimit-Remaining")
})
# Verify: first 100 should be 200, rest should be 429
Test Scenarios
| Scenario | Expected Behavior |
|---|---|
| Normal usage within limits | 200 with correct remaining count |
| Exactly at the limit | 200 for last allowed request |
| One over the limit | 429 with Retry-After header |
| After waiting for reset | 200 with full limit restored |
| Different endpoints | May have separate limits |
| Different auth tokens | Each user has own limits |
| No authentication | Typically stricter IP-based limits |
Rate Limit Recovery Test
After hitting the limit:
- Verify 429 response includes
Retry-After - Wait the specified duration
- Send another request — should succeed with 200
- Verify
X-RateLimit-Remainingis reset
Per-Endpoint vs. Global Limits
Some APIs have different limits per endpoint:
- Authentication: 5 requests/minute (stricter to prevent brute force)
- Read operations: 1000 requests/minute
- Write operations: 100 requests/minute
- Search: 30 requests/minute
Test that limits are applied per endpoint and do not bleed across different routes.
Distributed Rate Limiting
In microservices architectures, verify:
- Limits are shared across multiple API gateway instances
- Switching between servers does not reset the counter
- Load balancer routing does not affect rate limit accuracy
Common Rate Limiting Bugs
| Bug | How to Detect |
|---|---|
| Limits not enforced | Send more than the limit — all return 200 |
| Wrong remaining count | Track X-RateLimit-Remaining across requests |
| Reset time wrong | Check if reset timestamp matches actual behavior |
| No Retry-After on 429 | Inspect 429 responses for the header |
| Limits reset on error | Cause a 400 error, check if limit counter resets |
| Different limits per method | GET and POST on same endpoint may have different limits |
Hands-On Exercise
- Test GitHub API limits: GitHub allows 60 requests/hour unauthenticated. Send requests to
https://api.github.com/usersand track the rate limit headers. - Measure the window: Determine whether the rate limiter uses fixed or sliding windows by sending bursts at window boundaries.
- Recovery test: Hit the rate limit, wait for Retry-After duration, and verify recovery.
- Document limits: Create a table of all rate limits for a test API, including per-endpoint and per-user limits.
Key Takeaways
- Rate limiting protects APIs from abuse — testing it is critical for production readiness
- Common algorithms include fixed window, sliding window, token bucket, and leaky bucket — each has different burst behavior
- Always verify rate limit headers (Limit, Remaining, Reset) are accurate and consistent
- Test the complete cycle: normal usage, hitting the limit, receiving 429 with Retry-After, and recovery
- Per-endpoint, per-user, and per-IP limits may differ — test each independently