How to Set a Latency Budget: SLOs, Error Budgets, and Allocation Frameworks
A latency budget is the total acceptable response time for a user-facing request, allocated across every service in the chain. This guide walks through setting one, from the SLA down to per-service targets, with the error budget math that makes it work.
What Is a Latency Budget?
A latency budget answers the question: "How fast does this request need to be, and how do we divide that time across all the services involved?" It is the performance equivalent of a financial budget. Every service gets an allocation, and the total must stay within the constraint.
Without a budget, teams optimize locally. The authentication team makes auth fast. The database team makes queries fast. But nobody owns the total end-to-end latency. A budget creates shared accountability and clear targets.
Starting from the SLA
Your latency budget starts with your SLA. If you promise customers that 99.9% of requests complete in under 200ms, that is your P99.9 budget. Work backwards from there.
| Industry | Typical P99 SLA | P99.9 SLA | Notes |
|---|---|---|---|
| Ecommerce | 200ms | 500ms | Page load SLA, not time-to-first-byte |
| Fintech / Payments | 100ms | 250ms | Transaction processing latency |
| SaaS Platform | 300ms | 800ms | API response time for core endpoints |
| Gaming | 50ms | 100ms | Server tick rate and network round-trip |
| AdTech / RTB | 50ms | 100ms | Bid response deadline is typically 100ms |
Budget Allocation: 200ms Example
Here is a practical budget for a 200ms P99 SLA across a typical 6-service request chain. The key principle: allocate budget proportionally to complexity, and always leave 25% for network overhead and variance.
| Component | Budget (P99) | Actual P99 | Headroom | Status |
|---|---|---|---|---|
| API Gateway | 10ms | 8ms | +2ms | |
| Authentication | 15ms | 12ms | +3ms | |
| Business Logic | 50ms | 45ms | +5ms | |
| Database | 40ms | 38ms | +2ms | |
| Cache Lookup | 5ms | 3ms | +2ms | |
| Rendering | 30ms | 28ms | +2ms | |
| Network Overhead | 50ms | 42ms | +8ms | |
| Total | 200ms | 176ms | +24ms |
Database headroom is tight (2ms). If queries grow with data volume, this component will breach budget first. Monitor proactively.
Error Budgets and Latency
An error budget is the inverse of your SLO. If your SLO is 99.9% of requests under 200ms, your error budget is 0.1% of requests that are allowed to exceed 200ms. This translates to real time:
99.0%
SLO target
99.5%
SLO target
99.9%
SLO target
99.99%
SLO target
When your error budget is consumed, feature development pauses and the team focuses entirely on reliability improvements. This is the core SRE principle: reliability is a feature, and it has a measurable budget.
When Budgets Break
Single service exceeds allocation
If the database goes from 38ms to 80ms at P99, the total budget is blown by 40ms. Upstream services start timing out or returning degraded results.
Fix: Per-service P99 alerting. Alert at 80% of budget (32ms for the 40ms database budget).
Cascading timeouts
Service A waits for Service B, which waits for Service C. When C slows down, B's timeout fires, but A has already consumed most of its own budget waiting. The error propagates upward.
Fix: Set timeouts at each level to be less than the upstream timeout. C's timeout < B's timeout < A's timeout.
Retry storms
When a service slows, callers retry, multiplying load. If every caller retries 3 times, load on the slow service triples, making it even slower. Positive feedback loop.
Fix: Exponential backoff with jitter. Circuit breakers that open after N consecutive failures. Retry budgets per caller.
Monitoring Your Budget
OpenTelemetry
Distributed tracing across the full request chain. See exactly which service consumed how much of the latency budget per request. Essential for debugging P99 spikes.
Prometheus Histograms
Track per-service latency at multiple percentiles (P50, P90, P95, P99). Set alerts at 80% and 100% of budget. Use histogram_quantile() for percentile queries.
Grafana Dashboards
Visualize budget consumption in real time. Show each service's actual P99 against its budget as a gauge or bar chart. Red/amber/green thresholds.
SLO Alerting (Burn Rate)
Instead of alerting on every spike, alert when the error budget is being consumed too quickly. A burn rate of 10x means you will exhaust the monthly budget in 3 days at the current rate.