How to Set a Latency Budget: SLOs, Error Budgets, and Allocation Frameworks

A latency budget is the total acceptable response time for a user-facing request, allocated across every service in the chain. This guide walks through setting one, from the SLA down to per-service targets, with the error budget math that makes it work.

What Is a Latency Budget?

A latency budget answers the question: "How fast does this request need to be, and how do we divide that time across all the services involved?" It is the performance equivalent of a financial budget. Every service gets an allocation, and the total must stay within the constraint.

Without a budget, teams optimize locally. The authentication team makes auth fast. The database team makes queries fast. But nobody owns the total end-to-end latency. A budget creates shared accountability and clear targets.

Starting from the SLA

Your latency budget starts with your SLA. If you promise customers that 99.9% of requests complete in under 200ms, that is your P99.9 budget. Work backwards from there.

Industry	Typical P99 SLA	P99.9 SLA	Notes
Ecommerce	200ms	500ms	Page load SLA, not time-to-first-byte
Fintech / Payments	100ms	250ms	Transaction processing latency
SaaS Platform	300ms	800ms	API response time for core endpoints
Gaming	50ms	100ms	Server tick rate and network round-trip
AdTech / RTB	50ms	100ms	Bid response deadline is typically 100ms

Budget Allocation: 200ms Example

Here is a practical budget for a 200ms P99 SLA across a typical 6-service request chain. The key principle: allocate budget proportionally to complexity, and always leave 25% for network overhead and variance.

Component	Budget (P99)	Actual P99	Headroom
API Gateway	10ms	8ms	+2ms
Authentication	15ms	12ms	+3ms
Business Logic	50ms	45ms	+5ms
Database	40ms	38ms	+2ms
Cache Lookup	5ms	3ms	+2ms
Rendering	30ms	28ms	+2ms
Network Overhead	50ms	42ms	+8ms
Total	200ms	176ms	+24ms

Database headroom is tight (2ms). If queries grow with data volume, this component will breach budget first. Monitor proactively.

Error Budgets and Latency

An error budget is the inverse of your SLO. If your SLO is 99.9% of requests under 200ms, your error budget is 0.1% of requests that are allowed to exceed 200ms. This translates to real time:

99.0%

SLO target

Error budget1.0%

Annual3.65 days

Monthly7.3 hours

99.5%

SLO target

Error budget0.5%

Annual1.83 days

Monthly3.65 hours

99.9%

SLO target

Error budget0.1%

Annual8.76 hours

Monthly43.8 min

99.99%

SLO target

Error budget0.01%

Annual52.6 min

Monthly4.38 min

When your error budget is consumed, feature development pauses and the team focuses entirely on reliability improvements. This is the core SRE principle: reliability is a feature, and it has a measurable budget.

When Budgets Break

Single service exceeds allocation

If the database goes from 38ms to 80ms at P99, the total budget is blown by 40ms. Upstream services start timing out or returning degraded results.

Fix: Per-service P99 alerting. Alert at 80% of budget (32ms for the 40ms database budget).

Cascading timeouts

Service A waits for Service B, which waits for Service C. When C slows down, B's timeout fires, but A has already consumed most of its own budget waiting. The error propagates upward.

Fix: Set timeouts at each level to be less than the upstream timeout. C's timeout < B's timeout < A's timeout.

Retry storms

When a service slows, callers retry, multiplying load. If every caller retries 3 times, load on the slow service triples, making it even slower. Positive feedback loop.

Fix: Exponential backoff with jitter. Circuit breakers that open after N consecutive failures. Retry budgets per caller.

Monitoring Your Budget

OpenTelemetry

Distributed tracing across the full request chain. See exactly which service consumed how much of the latency budget per request. Essential for debugging P99 spikes.

Prometheus Histograms

Track per-service latency at multiple percentiles (P50, P90, P95, P99). Set alerts at 80% and 100% of budget. Use histogram_quantile() for percentile queries.

Grafana Dashboards

Visualize budget consumption in real time. Show each service's actual P99 against its budget as a gauge or bar chart. Red/amber/green thresholds.

SLO Alerting (Burn Rate)

Instead of alerting on every spike, alert when the error budget is being consumed too quickly. A burn rate of 10x means you will exhaust the monthly budget in 3 days at the current rate.

Home API Latency Optimization Guide