Abdullah Al Mamun
Architecture
RedisDistributed SystemsAd Tech

Designing Budget Pacing in CPM Ad Systems

Budget pacing sounds simple — spend X dollars by end of day. In practice, it's a distributed systems problem with race conditions, hot keys, and consistency trade-offs at every layer.

AA

Abdullah Al Mamun

March 15, 2024 · 9 min read

The Problem with "Just Decrement a Counter"

When I first looked at our budget pacing implementation, it was a single PostgreSQL row with a SELECT FOR UPDATE. At 500 RPS, this was fine. At 8,000 RPS, it was a serialization bottleneck that added 40ms to every auction.

The naive fix — move it to Redis with INCR — introduces a race condition. Between the GET (check remaining budget) and the INCR (decrement), another goroutine can win the same auction. You end up over-delivering.

The Atomic Solution

Redis Lua scripts execute atomically. A single round-trip, no race conditions:

local remaining = tonumber(redis.call('GET', KEYS[1]))
if remaining == nil or remaining <= 0 then
  return 0
end
redis.call('DECRBY', KEYS[1], ARGV[1])
return 1

This script checks the budget and decrements it in one atomic operation. If the budget is exhausted, it returns 0 and the auction is skipped.

The Hot Key Problem

High-spend advertisers create hot keys. A single Redis key receiving 10,000 writes/second becomes a bottleneck — Redis is single-threaded per key.

The solution is key sharding. Instead of one key per advertiser, use N shards:

budget:{advertiser_id}:{shard_id}

Each auction request picks a shard randomly. Budget queries sum across all shards. This distributes write load while keeping reads consistent.

Consistency Trade-offs

We accepted eventual consistency in budget reporting. The Redis state is the source of truth for serving decisions. A background job reconciles Redis state to PostgreSQL every 30 seconds for billing.

This means budget reports can lag by up to 30 seconds. Finance accepted this trade-off in exchange for the latency improvement.

What I'd Do Differently

The sliding window counter we use can front-load spend — if traffic is heavy in the morning, the budget can exhaust by noon. A token bucket implementation would distribute spend more smoothly throughout the day.

I'd also explore Redis Streams for the reconciliation pipeline instead of a polling job — it would give us a durable, replayable audit trail of every budget event.