Rate Limiting Algorithms

January 30, 2026

Overview

Rate limiting is a mechanism for controlling how many requests (or operations) a client can perform within a period of time. It's used to:

Protect systems from overload and cascading failures
Prevent abuse (scraping, brute-force login attempts)
Enforce fairness across tenants/users
Smooth traffic spikes and reduce tail latency

In practice, rate limiting is as much about product policy (who gets limited, how, and why) as it is about algorithms.

What Rate Limiting Actually Solves

Core Concepts & Nuances

Fixed Window Counter

Sliding Window Log

Sliding Window Counter

Leaky Bucket

Token Bucket

Concurrency & Race Conditions

Distributed Rate Limiting (Multi-Node)

Choosing the Right Algorithm

Common Pitfalls (and Fixes)

Putting It Together: Recommended Defaults

Conclusion

Key Takeaways

What Rate Limiting Actually Solves

Rate limiting helps with:

Overload protection: reject early instead of letting the system collapse
Fairness: one noisy client shouldn’t starve others
Cost control: keep expensive workloads within budget
Security: slow down brute force and endpoint probing
Quality of service: enforce tiers (free vs paid)

But rate limiting does not replace:

WAF / bot detection
Backpressure inside internal services
Capacity planning and autoscaling

Core Concepts & Nuances

What’s the “key” you limit on?

Common keys:

per IP
per user
per API key
per tenant/org
per route + user (e.g., /search stricter than /profile)
per cost unit (e.g., GraphQL query cost)

Nuances that matter

Bursts: is it okay to allow short spikes?
Fairness: do you need per-tenant fairness, or global?
Clock skew: distributed counters can drift
Idempotency: should retries count?
Tiering: limits differ for paid plans
Latency impact: enforcement must be O(1) and fast

Fixed Window Counter

Idea: count requests in discrete windows (e.g., per minute). Allow if count ≤ limit.

Pros

Very simple
Cheap in memory and compute

Cons (big gotcha)

Boundary burst: client can send limit requests at 12:00:59 and another limit at 12:01:00 → effectively 2× in 1 second.

# pseudo-code
# window_start_ts must be floored to the window boundary, e.g. ts - (ts % 60)
def allow_fixed_window(key, window_start_ts, limit, store):
    counter_key = f"{key}:{window_start_ts}"
    n = store.incr(counter_key)

    # Important: only set TTL when the key is first created.
    # Otherwise, frequent traffic can keep extending the window.
    if n == 1:
        store.expire(counter_key, 60)

    return n <= limit

Sliding Window Log

Idea: store timestamps of recent requests; allow if #timestamps in the last T seconds ≤ limit.

Pros

Accurate (no boundary burst)

Cons

Can be memory heavy (stores per-request events)
Concurrency gotcha: the check + insert must be atomic (e.g., Redis Lua), otherwise concurrent requests can overshoot the limit.

# pseudo-code for Redis sorted set (non-atomic sketch)
# In production: implement as a single atomic operation (Lua script / transaction).
def allow_sliding_log(key, now_ms, window_ms, limit, redis):
    # One sorted set per rate-limit key (user / IP / API key / route, etc.)
    zkey = f"rl:{key}"

    # 1) Remove requests that are outside the sliding window
    #    Keeps only events in (now - window_ms, now]
    redis.zremrangebyscore(zkey, 0, now_ms - window_ms)

    # 2) Count how many requests remain in the window
    count = redis.zcard(zkey)

    # 3) Enforce the limit
    #    If we've already seen 'limit' events in the window, reject
    if count >= limit:
        return False

    # 4) Generate a unique member ID for this request
    #    (ZSET members must be unique even if timestamps collide)
    seq_key = f"{zkey}:seq"
    seq = redis.incr(seq_key)

    # Keep the sequence counter bounded to the window lifetime
    redis.pexpire(seq_key, window_ms)

    # 5) Add this request to the window
    #    Score = timestamp (used for range queries)
    #    Member = unique identifier
    member = f"{now_ms}:{seq}"
    redis.zadd(zkey, {member: now_ms})

    # 6) Expire the window key when traffic stops
    #    (prevents stale keys from living forever)
    redis.pexpire(zkey, window_ms)

    return True

Sliding Window Counter

Idea: approximate sliding window using two adjacent fixed windows and weighted interpolation.

Pros

Much cheaper than log
Smooths boundary bursts

Cons

Approximation error (usually acceptable)

Conceptually:

current_window_count + prev_window_count * overlap_ratio

Leaky Bucket

Idea: requests enter a bucket (queue). The bucket leaks at a constant rate.

Pros

Produces stable output rate (smooths traffic)
Great for downstream systems that hate bursts

Cons

Adds queueing latency
If bucket is full, requests are dropped

Use cases:

Email sending
Background job dispatch
Payment provider calls

Token Bucket

Idea: tokens accumulate at a steady rate up to a max capacity. Each request consumes tokens.

This is the most common choice for APIs because it allows controlled bursts.

import time

class TokenBucket:
    def __init__(self, rate_per_sec, burst):
        self.rate = rate_per_sec
        self.capacity = burst
        self.tokens = burst
        self.last = time.monotonic()  # monotonic avoids issues from wall-clock jumps

    def allow(self, cost=1):
        now = time.monotonic()
        elapsed = now - self.last
        self.last = now

        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        if self.tokens >= cost:
            self.tokens -= cost
            return True
        return False

Concurrency & Race Conditions

In real systems, multiple servers may check the same key at once.

You need one of:

atomic operations (Redis Lua script)
strongly consistent store
per-node local buckets + global cap (hybrid)

Big gotcha: If your algorithm does read → compute → write, it’s likely race-prone unless atomic.

Distributed Rate Limiting (Multi-Node)

Approach 1: Central store (Redis)

Pros: strong global enforcement
Cons: adds latency; central dependency; can become a bottleneck

Approach 2: Per-node local + periodic sync

Pros: very low latency
Cons: approximate; can overshoot global limit during spikes

Approach 3: Edge-based limiting (CDN/WAF)

Pros: blocks abuse early; reduces origin load
Cons: limited flexibility; might not see user identity

Choosing the Right Algorithm

Algorithm	Best For	Pros	Cons
Fixed Window	Quick “good enough” controls	Simple, cheap	Boundary burst
Sliding Log	Strict correctness	Most accurate	Memory heavy
Sliding Counter	Practical correctness at scale	Low memory, smooth	Approximate
Leaky Bucket	Smoothing bursts to protect downstream	Stable rate output	Queue latency
Token Bucket	APIs that allow bursts	Most flexible	More complex state

Common Pitfalls (and Fixes)

Pitfall	Cause	Fix
Limits too global	One client starves others	Limit per tenant/user, not just per IP
Thundering herd after 429	Clients retry immediately	Return `Retry-After`, use exponential backoff + jitter
Race conditions	Non-atomic updates	Use atomic ops (Lua/transactions)
Clock skew	Distributed time disagreement	Use server time; prefer monotonic time; avoid client timestamps
Rate limiting only at origin	Abuse reaches your servers first	Limit at edge (CDN/WAF) + origin as backup

Putting It Together: Recommended Defaults

A good baseline for many APIs:

Edge: coarse per-IP limits (stop obvious abuse)
Origin: token bucket per API key / user
Critical endpoints: stricter limits on login, password reset, search
Retries: return 429 + Retry-After and enforce backoff

If you need fairness across tenants:

enforce per-tenant limits + global cap
optionally weighted by plan tier

Conclusion

Rate limiting looks simple, but real systems require careful thought about bursts, fairness, distributed correctness, and client retry behavior. Token bucket is often the default winner for APIs, but fixed and sliding windows can be great depending on your requirements. The best implementations are layered: edge + origin, cheap controls + stricter per-identity enforcement.

Key Takeaways

Rate limiting is both an algorithm choice and a policy choice
Token bucket is usually the most practical API default
Sliding window log is strict but expensive; sliding counter is a great compromise
Distributed implementations need atomicity, skew awareness, and retry discipline

Overview

Table of Contents

What Rate Limiting Actually Solves

Core Concepts & Nuances

What’s the “key” you limit on?

Nuances that matter

Fixed Window Counter

Pros

Cons (big gotcha)

Sliding Window Log

Pros

Cons

Sliding Window Counter

Pros

Cons

Leaky Bucket

Pros

Cons

Token Bucket

Concurrency & Race Conditions

Distributed Rate Limiting (Multi-Node)

Approach 1: Central store (Redis)

Approach 2: Per-node local + periodic sync

Approach 3: Edge-based limiting (CDN/WAF)

Choosing the Right Algorithm

Common Pitfalls (and Fixes)

Putting It Together: Recommended Defaults

Conclusion

Key Takeaways