What is the difference between SLO, SLA, and SLI for voice agents?

An SLI is the number you watch, such as task completion rate for eligible booking calls. An SLO is the internal target for that number, such as 90% task completion over 30 days, while an SLA is the customer-facing promise or contract that may include remedies if missed. Teams should usually define SLOs before SLAs because internal targets are easier to tune than customer commitments.

How do you calculate a voice agent error budget?

Calculate the error budget as (1 - SLO target) multiplied by eligible events in the measurement window. A 99.5% task-completion SLO across 100,000 eligible calls allows 500 agent-caused failed task attempts in that window before the team is out of budget. The key is to define eligible events and bad events before doing the math.

What is an eligible event in a voice agent SLO?

An eligible event is a call, turn, or task attempt that should count toward the SLO. In practice, an eligible booking call might exclude test calls, abusive calls, or caller hangups before the greeting, while still counting real customer attempts where the agent had enough information to complete the task.

Which SLIs should a production voice agent track first?

Start with four SLIs: call connection success, turn response latency, task completion, and escalation correctness. Hamming's analysis of production voice agent calls shows these signals are easier to operationalize than broad satisfaction scores because they map directly to caller-visible failure modes.

What is a burn-rate alert for voice agent reliability?

A burn-rate alert tells you how quickly the voice agent is consuming its error budget compared with the allowed bad-event rate. A burn rate above 1 means the current failure rate would miss the SLO if sustained, while a 10x fast burn on task completion usually deserves immediate incident review.

Should voice agent SLOs use calls, turns, or tasks as the unit?

Use the unit that matches the promise being measured: calls for connection success, turns for latency, and tasks for business outcomes. Hamming recommends avoiding one blended reliability score because a call can connect successfully while still failing the caller's task.

How often should voice agent SLOs be reviewed?

Review SLO health weekly during rollout and monthly once targets are stable. Revisit the target after major prompt, model, provider, routing, or workflow changes, especially if the dashboard shows budget burn without matching user pain or user pain without budget burn.

What should happen when a voice agent error budget is exhausted?

When the budget is exhausted, pause risky prompt, model, routing, provider, and workflow changes unless they reduce budget burn or fix security issues. Hamming recommends using the exhausted-budget period to inspect failed calls, add regression tests, and fund reliability work before returning to normal feature velocity.

How does Hamming help teams implement voice agent SLOs?

Hamming gives teams the production call analysis, trace evidence, evaluation results, and regression-test workflow needed to define and monitor voice-agent SLIs. Teams can use those signals to build SLO dashboards, investigate budget burn, and turn recurring production failures into test coverage.

Voice Agent SLOs: Define Error Budgets and Reliability Dashboards

Q: What is a voice agent SLO?

A voice agent SLO is a measurable reliability target for the caller experience over a fixed window, such as 99.5% of eligible booking calls completing without an agent-caused failure over 30 days. Hamming recommends measuring user-visible outcomes like connection success, response latency, task completion, and escalation correctness rather than only infrastructure uptime.

If your voice agent is "up" but callers cannot finish their task, your uptime SLO is telling the wrong story. Voice agent SLOs need to measure whether conversations work: users connect, the agent responds quickly enough, the right intent is handled, and the business outcome completes without a bad escalation.

This guide turns production voice-agent metrics into service-level objectives, error budgets, burn-rate alerts, and a reliability dashboard your engineering and operations teams can actually use.

TL;DR: An SLI is the number you watch. An SLO is the target you expect that number to meet. An SLA is the promise you make to a customer. For voice agents, the useful targets are caller-visible outcomes: connection success, response latency, task completion, and escalation correctness.

Quick filter: If your release review says "all systems green" while fallback rate, interruption rate, or task completion is getting worse, you need voice-agent SLOs.

Methodology Note: The SLO templates and dashboard patterns in this guide are based on Hamming's analysis of production voice agent calls across 10K+ voice agents (2025-2026). Hamming's platform has 10M+ mins protected.
Use the starter targets as a starting point, not a universal contract. Calibrate them to call volume, risk, user expectations, and whether the voice agent handles regulated or revenue-critical tasks.

Last Updated: May 2026

Related Guides:

Voice Agent Monitoring KPIs - production metrics and alert thresholds that feed SLOs
How to Monitor Voice Agent Outages in Real Time - outage detection signals that become fast-burn alerts
Testing Voice Agents for Production Reliability - release testing before SLO-impacting changes
Voice Agent Observability Tracing - traces that explain why an SLO burned
Voice Agent Incident Response Runbook - response playbooks when error budget is at risk
Voice Agent Dashboard Template - dashboard layout patterns for operators and executives

SLO vs SLA vs SLI

These terms are easy to mix up because they are usually discussed together. In practice, they answer three different questions:

Term	What It Means	Question It Answers	Voice Agent Scenario
SLI: Service-Level Indicator	The number or signal you watch.	"What are we measuring?"	"Task completion rate for eligible booking calls."
SLO: Service-Level Objective	The internal target for that number.	"What level counts as reliable enough?"	"90% of eligible booking calls complete without an agent-caused failure over 30 days."
SLA: Service-Level Agreement	The customer-facing promise, usually in a contract, that may include credits, remedies, or escalation terms if missed.	"What have we promised customers?"	"Hamming will meet the contracted availability or support-response commitment for the customer's production workspace."

Think of an SLI as the instrument reading, the SLO as the line you draw on the dashboard, and the SLA as the promise you are willing to put in front of a customer.

For this guide, the important distinction is that most voice-agent teams should define internal SLOs before they put anything into an SLA. SLOs help engineering, product, and operations agree on what "reliable enough" means. SLAs are customer-facing commitments, so they should be narrower, easier to prove, and reviewed with legal and customer-facing teams.

Working model: SLI = measurement. SLO = target. SLA = customer promise.

Here is the same idea as a sequence:

Pick the SLI: task completion rate for eligible booking calls.
Set the SLO: 90% of those calls should complete successfully over 30 days.
Track the error budget: the remaining 10% is the failure room before the target is missed.
Put only the narrowest, most provable commitments into an SLA.

What Is a Voice Agent SLO?

A voice agent SLO is a reliability target for the caller experience over a fixed window. It turns a vague expectation like "the agent should work" into a concrete line: this many calls, turns, or task attempts must go well.

Definition: A voice agent SLO is a target for a voice-specific service-level indicator, such as "99.5% of eligible appointment-booking calls complete without an agent-caused failure over 30 days."

The important shift is the unit. Traditional SLOs often measure requests. Voice agents need SLOs over calls, turns, intents, tasks, and handoffs because that is where users feel failure. A database request can succeed while the caller still gets stuck in a fallback loop.

Traditional service SLO	Voice agent SLO equivalent	Why the voice version matters
HTTP availability	Call connection success	A connected call is the first user-visible availability event.
API latency	Time to first agent response and turn latency	A technically fast backend can still produce awkward pauses.
Request success rate	Task completion rate	A call can return 200s while the caller fails to complete the job.
Error rate	Agent-caused bad-call rate	Misclassified intent, bad transfer, or hallucinated answer should consume budget.
Dependency uptime	Critical-flow completion	Users care whether billing, booking, or support resolution worked end to end.

Google's SRE workbook defines an error budget as the unreliability allowed by an SLO. For voice agents, that budget should be spent on user-visible bad events, not only system exceptions.

The Voice Agent SLO Starter Kit

Start with four SLOs. More can come later, but these four catch the most important reliability gaps without turning the dashboard into a wall of disconnected charts.

SLO	Good event	Bad event	Starting target	Owner
Connection success	Caller reaches the intended voice agent and receives the greeting	Failed connection, wrong route, dead air before greeting	99.5% of eligible calls	Platform or telephony owner
Response latency	Agent responds within the agreed turn-taking window	P95 turn latency exceeds the threshold for an eligible turn	95% of turns under 1.2 seconds	Voice runtime owner
Task completion	Caller completes the primary task without agent-caused failure	Task abandoned, wrong workflow, unresolved fallback loop	90% of eligible task attempts	Product owner plus agent owner
Escalation correctness	Escalation happens when required, with context preserved	Missed escalation, unnecessary transfer, or lost handoff context	97% of audited escalation decisions	Operations owner

These are starting targets. A healthcare triage flow may need tighter escalation correctness than a retail order-status bot. A low-risk FAQ agent may accept a lower task-completion target while the team learns.

The wrong move is to set every target to 99.99% because it looks professional. Google Cloud's SLO documentation warns that useful SLOs should not be higher than necessary or meaningful to users. For voice agents, unrealistic SLOs create permanent failure noise and train teams to ignore the dashboard.

How to Calculate Voice Agent Error Budgets

An error budget is the amount of failure room implied by the SLO. If the SLO says 99.5% of calls must be good, the error budget is the remaining 0.5%.

Before the formula, define two inputs:

Eligible event: a call, turn, or task attempt that should count toward the SLO.
Bad event: an eligible event that fails the rule you agreed on.

The core math is:

Error budget = (1 - SLO target) x eligible events in the windowBudget consumed = bad events in the windowBudget remaining = error budget - budget consumedBurn rate = current bad-event rate / allowed bad-event rate

If a production voice agent handles 100,000 eligible booking calls in 30 days and has a 99.5% task-completion SLO, the error budget is 500 agent-caused failed task attempts. That means the team can tolerate up to 500 bad booking attempts before the SLO is missed.

Input	Value
Eligible calls	100,000
SLO target	99.5% good calls
Allowed bad-call rate	0.5%
30-day error budget	500 bad calls
Bad calls so far	380
Budget remaining	120 bad calls

If the agent starts failing 2% of eligible calls, it is burning budget at 4x the allowed rate. Sustained long enough, that rate will miss the SLO even if the service never crashes.

Voice-agent error budget: the number of user-visible bad calls, turns, or workflow attempts your team can tolerate in a window before reliability work should outrank risky feature changes.

For more raw metric definitions, use the voice agent evaluation metrics guide and the post-call analytics metrics dictionary as the measurement layer. SLOs sit one level above those metrics and decide which misses count against reliability.

Which Measurements Should Feed a Voice Agent SLO?

An SLI is the measurement behind the SLO. The best SLIs are boring, user-visible, and hard to game. If a customer would not notice the failure, it usually should not be your first SLO.

Connection and Availability SLIs

Use these when the voice agent must be reachable.

SLI	Formula	Count as bad when
Call connection success	Successful agent-connected calls / eligible inbound calls	Call fails, routes to the wrong agent, or greeting never plays
First-audio success	Calls with greeting audio delivered / connected calls	Caller hears dead air or malformed greeting
Synthetic critical-flow success	Passing synthetic calls / scheduled synthetic calls	Synthetic call cannot complete the target path

Synthetic calls matter because voice-agent traffic is often spiky. Google's SRE alerting guidance notes that low-traffic services need special treatment; otherwise, real users become your only monitoring signal. For voice systems, synthetic calls should cover the flows where failure is expensive.

Latency SLIs

Use latency SLOs for conversational feel, not just backend speed.

SLI	Formula	Starting target
Time to first word	Calls where first agent audio starts within target / connected calls	95% under 1.5 seconds
Turn latency	Turns where response starts within target / eligible turns	95% under 1.2 seconds
Tool-dependent turn latency	Tool turns under target / eligible tool turns	95% under 2.5 seconds

Pair this with OpenTelemetry for voice agents, because SLO dashboards tell you the user impact while traces tell you whether the burn came from ASR, LLM, TTS, tool calls, or a downstream API.

Quality SLIs

Quality SLIs should count outcomes, not vibes.

SLI	Formula	Count as bad when
Task completion	Completed target tasks / eligible task attempts	The agent causes abandonment, wrong action, or unresolved loop
Intent handling accuracy	Correct first major intent / audited eligible calls	Intent classification sends the call down the wrong path
Prompt compliance	Compliant evaluated turns / evaluated turns	The agent violates an instruction that matters to the user or business
ASR quality	Turns under WER threshold / evaluated turns	Word error rate crosses the flow-specific threshold

For ASR-specific targets, see the ASR accuracy evaluation guide. Do not use one universal word-error target across every voice agent. A noisy field-service call and a quiet desktop support call have different baselines.

Escalation and Safety SLIs

Escalation errors are expensive because they turn automation into customer frustration.

SLI	Formula	Count as bad when
Required escalation recall	Required escalations completed / calls requiring escalation	The agent should transfer but does not
Unnecessary escalation rate	Correct non-escalations / calls not requiring escalation	The agent transfers when it should resolve
Context-preserved handoff	Escalations with summary and required fields / escalations	Human receives missing or wrong context

For regulated or high-risk workflows, escalation correctness may be the most important SLO even if it has lower volume than latency or task completion.

How to Build a Voice Agent Reliability Dashboard

A good dashboard answers four questions in this order:

Are callers currently affected?
Which SLO is burning?
Which flow, agent version, provider, or dependency is responsible?
Should we keep shipping, pause changes, or start incident response?

Dashboard row	Panels	Decision it supports
SLO health	Current compliance, 30-day budget remaining, forecasted budget exhaustion	Are we within reliability policy?
Burn-rate alerts	Fast burn, slow burn, budget consumed by flow	Is this urgent or slow drift?
Flow breakdown	Task completion, latency, escalation correctness by intent and route	Which customer journey is affected?
Pipeline attribution	ASR, LLM, TTS, tool-call, telephony, and CRM latency/error slices	Which subsystem should investigate?
Release overlay	Agent version, prompt version, model/provider change, config deploys	Did a recent change start the burn?
Review queue	Top failed calls, traces, audio snippets, regression-test candidates	What should humans inspect first?

The voice agent dashboard template covers layout mechanics. For SLOs, add two panels that generic dashboards often miss: budget remaining and burn-rate forecast.

Budget exhaustion forecast =  remaining budget / current bad-event rateRelease gate =  block risky changes when budget remaining is low  and burn rate is above policy threshold

The dashboard should not be a compliance artifact that someone checks once a month. It should be the first page an on-call engineer opens when a production voice agent feels wrong.

Burn-Rate Alerts for Voice Agents

Burn rate measures how quickly the agent is consuming its error budget. Google Cloud's burn-rate documentation describes a burn rate above 1 as a sign that, if sustained, the service would miss its SLO for the compliance period.

Voice agents need two classes of burn alerts:

Alert type	Scenario	Page?	Why
Fast burn	Task-completion bad-event rate is 10x budget for 15 minutes	Yes, if user impact is material	A release or provider issue may be breaking active calls.
Slow burn	Escalation correctness is 1.5x budget for 24 hours	Usually ticket or Slack	Quality drift needs ownership but may not need immediate paging.
Synthetic failure	Three consecutive critical-path synthetic calls fail	Yes during business hours	Low live traffic can hide a real outage.
Budget floor	Less than 20% of 30-day budget remains	No page by itself	Use as a release-risk signal.

Fast-burn alerts should be tied to user-visible impact. Slow-burn alerts should create work, not noise. For detailed outage response mechanics, pair these alerts with the voice agent incident response runbook.

Release Policy: What Happens When the Budget Burns?

An SLO without a policy is just a chart. Before the next incident, decide what happens when budget is low.

Budget state	Release policy	Reliability action
Healthy: more than 50% budget remains and burn rate is normal	Ship normally	Keep monitoring and add regression coverage for major changes
Watch: 20-50% budget remains or slow burn persists	Require owner review for risky changes	Investigate top budget consumers and schedule fixes
Freeze: under 20% budget remains and burn rate is above 1	Pause risky prompt, model, routing, or provider changes	Focus on reliability fixes, rollback candidates, and regression tests
Exhausted: budget below 0	Only ship incident fixes, security fixes, or changes that reduce burn	Run postmortem, update SLO definition if it failed to capture user pain

This policy should not punish teams for finding reliability issues. Google's SRE error-budget policy frames budget exhaustion as permission to focus on reliability when the data says reliability matters more than feature velocity.

For voice agents, "risky change" includes more than code:

Prompt updates
Model/provider changes
ASR language or acoustic model changes
TTS voice and latency configuration
Routing and transfer policy changes
Tool-call schema or timeout changes
Knowledge-base retrieval changes

Tie the release gate to the actual failing SLO. If only the Spanish billing flow is burning budget, the team may still be able to ship unrelated English FAQ improvements. If global connection success is burning, pause broadly.

Day	Work	Output
1-3	Pick the top 3-5 user journeys by volume, revenue, or risk	Eligible-event definitions
4-7	Define good and bad events for each journey	SLI spec with exclusions
8-10	Backtest 30-60 days of production calls if available	Baseline and proposed targets
11-14	Build SLO dashboard with budget remaining and burn rate	Operator and exec views
15-18	Add synthetic calls for low-volume critical flows	Synthetic SLI coverage
19-22	Configure fast-burn and slow-burn alerts	Paging and ticket policy
23-26	Write release gates and ownership policy	Budget-state playbook
27-30	Run a review with product, engineering, and operations	Approved SLO v1

The first version will be imperfect. That is fine. SLOs improve when teams compare the target to real user pain, postmortems, and release decisions.

Voice Agent SLOs: Define Error Budgets and Reliability Dashboards

SLO vs SLA vs SLI

What Is a Voice Agent SLO?

The Voice Agent SLO Starter Kit

How to Calculate Voice Agent Error Budgets

Which Measurements Should Feed a Voice Agent SLO?

Connection and Availability SLIs

Latency SLIs

Quality SLIs

Escalation and Safety SLIs

How to Build a Voice Agent Reliability Dashboard

Burn-Rate Alerts for Voice Agents

Release Policy: What Happens When the Budget Burns?

Common Mistakes

Mistake 1: Using Infrastructure Uptime as the Main SLO

Mistake 2: Counting Every Failed Call Against the Agent

Mistake 3: Setting One Target Across Every Flow

Mistake 4: Alerting on Every Metric Instead of Budget Burn

30-Day Rollout Checklist

Frequently Asked Questions

Sumanyu Sharma

Related Resources

How to Monitor Voice Agent Outages in Real Time

Voice Agent Dead Air Detection: Root Causes and Fixes

Voice Agent Hallucination Detection Guide