How to Set a Latency Budget: SLOs, Error Budgets, and Allocation Frameworks

A latency budget is the total acceptable response time for a user-facing request, allocated across every service in the chain. This guide walks through setting one, from the SLA down to per-service targets, with the error budget math that makes it work.

What Is a Latency Budget?

A latency budget answers the question: "How fast does this request need to be, and how do we divide that time across all the services involved?" It is the performance equivalent of a financial budget. Every service gets an allocation, and the total must stay within the constraint.

Without a budget, teams optimize locally. The authentication team makes auth fast. The database team makes queries fast. But nobody owns the total end-to-end latency. A budget creates shared accountability and clear targets.

Starting from the SLA

Your latency budget starts with your SLA. If you promise customers that 99.9% of requests complete in under 200ms, that is your P99.9 budget. Work backwards from there.

IndustryTypical P99 SLAP99.9 SLANotes
Ecommerce200ms500msPage load SLA, not time-to-first-byte
Fintech / Payments100ms250msTransaction processing latency
SaaS Platform300ms800msAPI response time for core endpoints
Gaming50ms100msServer tick rate and network round-trip
AdTech / RTB50ms100msBid response deadline is typically 100ms

Budget Allocation: 200ms Example

Here is a practical budget for a 200ms P99 SLA across a typical 6-service request chain. The key principle: allocate budget proportionally to complexity, and always leave 25% for network overhead and variance.

ComponentBudget (P99)Actual P99HeadroomStatus
API Gateway10ms8ms+2ms
Authentication15ms12ms+3ms
Business Logic50ms45ms+5ms
Database40ms38ms+2ms
Cache Lookup5ms3ms+2ms
Rendering30ms28ms+2ms
Network Overhead50ms42ms+8ms
Total200ms176ms+24ms

Database headroom is tight (2ms). If queries grow with data volume, this component will breach budget first. Monitor proactively.

Error Budgets and Latency

An error budget is the inverse of your SLO. If your SLO is 99.9% of requests under 200ms, your error budget is 0.1% of requests that are allowed to exceed 200ms. This translates to real time:

99.0%

SLO target

Error budget1.0%
Annual3.65 days
Monthly7.3 hours

99.5%

SLO target

Error budget0.5%
Annual1.83 days
Monthly3.65 hours

99.9%

SLO target

Error budget0.1%
Annual8.76 hours
Monthly43.8 min

99.99%

SLO target

Error budget0.01%
Annual52.6 min
Monthly4.38 min

When your error budget is consumed, feature development pauses and the team focuses entirely on reliability improvements. This is the core SRE principle: reliability is a feature, and it has a measurable budget.

When Budgets Break

Single service exceeds allocation

If the database goes from 38ms to 80ms at P99, the total budget is blown by 40ms. Upstream services start timing out or returning degraded results.

Fix: Per-service P99 alerting. Alert at 80% of budget (32ms for the 40ms database budget).

Cascading timeouts

Service A waits for Service B, which waits for Service C. When C slows down, B's timeout fires, but A has already consumed most of its own budget waiting. The error propagates upward.

Fix: Set timeouts at each level to be less than the upstream timeout. C's timeout < B's timeout < A's timeout.

Retry storms

When a service slows, callers retry, multiplying load. If every caller retries 3 times, load on the slow service triples, making it even slower. Positive feedback loop.

Fix: Exponential backoff with jitter. Circuit breakers that open after N consecutive failures. Retry budgets per caller.

Monitoring Your Budget

OpenTelemetry

Distributed tracing across the full request chain. See exactly which service consumed how much of the latency budget per request. Essential for debugging P99 spikes.

Prometheus Histograms

Track per-service latency at multiple percentiles (P50, P90, P95, P99). Set alerts at 80% and 100% of budget. Use histogram_quantile() for percentile queries.

Grafana Dashboards

Visualize budget consumption in real time. Show each service's actual P99 against its budget as a gauge or bar chart. Red/amber/green thresholds.

SLO Alerting (Burn Rate)

Instead of alerting on every spike, alert when the error budget is being consumed too quickly. A burn rate of 10x means you will exhaust the monthly budget in 3 days at the current rate.