If your voice agent is "up" but callers cannot finish their task, your uptime SLO is telling the wrong story. Voice agent SLOs need to measure whether conversations work: users connect, the agent responds quickly enough, the right intent is handled, and the business outcome completes without a bad escalation.
This guide turns production voice-agent metrics into service-level objectives, error budgets, burn-rate alerts, and a reliability dashboard your engineering and operations teams can actually use.
TL;DR: An SLI is the number you watch. An SLO is the target you expect that number to meet. An SLA is the promise you make to a customer. For voice agents, the useful targets are caller-visible outcomes: connection success, response latency, task completion, and escalation correctness.
Quick filter: If your release review says "all systems green" while fallback rate, interruption rate, or task completion is getting worse, you need voice-agent SLOs.
Methodology Note: The SLO templates and dashboard patterns in this guide are based on Hamming's analysis of 4M+ production voice agent calls across 10K+ voice agents (2025-2026).Use the starter targets as a starting point, not a universal contract. Calibrate them to call volume, risk, user expectations, and whether the voice agent handles regulated or revenue-critical tasks.
Last Updated: May 2026
Related Guides:
- Voice Agent Monitoring KPIs — production metrics and alert thresholds that feed SLOs
- How to Monitor Voice Agent Outages in Real Time — outage detection signals that become fast-burn alerts
- Testing Voice Agents for Production Reliability — release testing before SLO-impacting changes
- Voice Agent Observability Tracing — traces that explain why an SLO burned
- Voice Agent Incident Response Runbook — response playbooks when error budget is at risk
- Voice Agent Dashboard Template — dashboard layout patterns for operators and executives
SLO vs SLA vs SLI
These terms are easy to mix up because they are usually discussed together. In practice, they answer three different questions:
| Term | What It Means | Question It Answers | Voice Agent Scenario |
|---|---|---|---|
| SLI: Service-Level Indicator | The number or signal you watch. | "What are we measuring?" | "Task completion rate for eligible booking calls." |
| SLO: Service-Level Objective | The internal target for that number. | "What level counts as reliable enough?" | "90% of eligible booking calls complete without an agent-caused failure over 30 days." |
| SLA: Service-Level Agreement | The customer-facing promise, usually in a contract, that may include credits, remedies, or escalation terms if missed. | "What have we promised customers?" | "Hamming will meet the contracted availability or support-response commitment for the customer's production workspace." |
Think of an SLI as the instrument reading, the SLO as the line you draw on the dashboard, and the SLA as the promise you are willing to put in front of a customer.
For this guide, the important distinction is that most voice-agent teams should define internal SLOs before they put anything into an SLA. SLOs help engineering, product, and operations agree on what "reliable enough" means. SLAs are customer-facing commitments, so they should be narrower, easier to prove, and reviewed with legal and customer-facing teams.
Working model: SLI = measurement. SLO = target. SLA = customer promise.
Here is the same idea as a sequence:
- Pick the SLI: task completion rate for eligible booking calls.
- Set the SLO: 90% of those calls should complete successfully over 30 days.
- Track the error budget: the remaining 10% is the failure room before the target is missed.
- Put only the narrowest, most provable commitments into an SLA.
What Is a Voice Agent SLO?
A voice agent SLO is a reliability target for the caller experience over a fixed window. It turns a vague expectation like "the agent should work" into a concrete line: this many calls, turns, or task attempts must go well.
Definition: A voice agent SLO is a target for a voice-specific service-level indicator, such as "99.5% of eligible appointment-booking calls complete without an agent-caused failure over 30 days."
The important shift is the unit. Traditional SLOs often measure requests. Voice agents need SLOs over calls, turns, intents, tasks, and handoffs because that is where users feel failure. A database request can succeed while the caller still gets stuck in a fallback loop.
| Traditional service SLO | Voice agent SLO equivalent | Why the voice version matters |
|---|---|---|
| HTTP availability | Call connection success | A connected call is the first user-visible availability event. |
| API latency | Time to first agent response and turn latency | A technically fast backend can still produce awkward pauses. |
| Request success rate | Task completion rate | A call can return 200s while the caller fails to complete the job. |
| Error rate | Agent-caused bad-call rate | Misclassified intent, bad transfer, or hallucinated answer should consume budget. |
| Dependency uptime | Critical-flow completion | Users care whether billing, booking, or support resolution worked end to end. |
Google's SRE workbook defines an error budget as the unreliability allowed by an SLO. For voice agents, that budget should be spent on user-visible bad events, not only system exceptions.
The Voice Agent SLO Starter Kit
Start with four SLOs. More can come later, but these four catch the most important reliability gaps without turning the dashboard into a wall of disconnected charts.
| SLO | Good event | Bad event | Starting target | Owner |
|---|---|---|---|---|
| Connection success | Caller reaches the intended voice agent and receives the greeting | Failed connection, wrong route, dead air before greeting | 99.5% of eligible calls | Platform or telephony owner |
| Response latency | Agent responds within the agreed turn-taking window | P95 turn latency exceeds the threshold for an eligible turn | 95% of turns under 1.2 seconds | Voice runtime owner |
| Task completion | Caller completes the primary task without agent-caused failure | Task abandoned, wrong workflow, unresolved fallback loop | 90% of eligible task attempts | Product owner plus agent owner |
| Escalation correctness | Escalation happens when required, with context preserved | Missed escalation, unnecessary transfer, or lost handoff context | 97% of audited escalation decisions | Operations owner |
These are starting targets. A healthcare triage flow may need tighter escalation correctness than a retail order-status bot. A low-risk FAQ agent may accept a lower task-completion target while the team learns.
The wrong move is to set every target to 99.99% because it looks professional. Google Cloud's SLO documentation warns that useful SLOs should not be higher than necessary or meaningful to users. For voice agents, unrealistic SLOs create permanent failure noise and train teams to ignore the dashboard.
How to Calculate Voice Agent Error Budgets
An error budget is the amount of failure room implied by the SLO. If the SLO says 99.5% of calls must be good, the error budget is the remaining 0.5%.
Before the formula, define two inputs:
- Eligible event: a call, turn, or task attempt that should count toward the SLO.
- Bad event: an eligible event that fails the rule you agreed on.
The core math is:
Error budget = (1 - SLO target) x eligible events in the window
Budget consumed = bad events in the window
Budget remaining = error budget - budget consumed
Burn rate = current bad-event rate / allowed bad-event rate
If a production voice agent handles 100,000 eligible booking calls in 30 days and has a 99.5% task-completion SLO, the error budget is 500 agent-caused failed task attempts. That means the team can tolerate up to 500 bad booking attempts before the SLO is missed.
| Input | Value |
|---|---|
| Eligible calls | 100,000 |
| SLO target | 99.5% good calls |
| Allowed bad-call rate | 0.5% |
| 30-day error budget | 500 bad calls |
| Bad calls so far | 380 |
| Budget remaining | 120 bad calls |
If the agent starts failing 2% of eligible calls, it is burning budget at 4x the allowed rate. Sustained long enough, that rate will miss the SLO even if the service never crashes.
Voice-agent error budget: the number of user-visible bad calls, turns, or workflow attempts your team can tolerate in a window before reliability work should outrank risky feature changes.
For more raw metric definitions, use the voice agent evaluation metrics guide and the post-call analytics metrics dictionary as the measurement layer. SLOs sit one level above those metrics and decide which misses count against reliability.
Which Measurements Should Feed a Voice Agent SLO?
An SLI is the measurement behind the SLO. The best SLIs are boring, user-visible, and hard to game. If a customer would not notice the failure, it usually should not be your first SLO.
Connection and Availability SLIs
Use these when the voice agent must be reachable.
| SLI | Formula | Count as bad when |
|---|---|---|
| Call connection success | Successful agent-connected calls / eligible inbound calls | Call fails, routes to the wrong agent, or greeting never plays |
| First-audio success | Calls with greeting audio delivered / connected calls | Caller hears dead air or malformed greeting |
| Synthetic critical-flow success | Passing synthetic calls / scheduled synthetic calls | Synthetic call cannot complete the target path |
Synthetic calls matter because voice-agent traffic is often spiky. Google's SRE alerting guidance notes that low-traffic services need special treatment; otherwise, real users become your only monitoring signal. For voice systems, synthetic calls should cover the flows where failure is expensive.
Latency SLIs
Use latency SLOs for conversational feel, not just backend speed.
| SLI | Formula | Starting target |
|---|---|---|
| Time to first word | Calls where first agent audio starts within target / connected calls | 95% under 1.5 seconds |
| Turn latency | Turns where response starts within target / eligible turns | 95% under 1.2 seconds |
| Tool-dependent turn latency | Tool turns under target / eligible tool turns | 95% under 2.5 seconds |
Pair this with OpenTelemetry for voice agents, because SLO dashboards tell you the user impact while traces tell you whether the burn came from ASR, LLM, TTS, tool calls, or a downstream API.
Quality SLIs
Quality SLIs should count outcomes, not vibes.
| SLI | Formula | Count as bad when |
|---|---|---|
| Task completion | Completed target tasks / eligible task attempts | The agent causes abandonment, wrong action, or unresolved loop |
| Intent handling accuracy | Correct first major intent / audited eligible calls | Intent classification sends the call down the wrong path |
| Prompt compliance | Compliant evaluated turns / evaluated turns | The agent violates an instruction that matters to the user or business |
| ASR quality | Turns under WER threshold / evaluated turns | Word error rate crosses the flow-specific threshold |
For ASR-specific targets, see the ASR accuracy evaluation guide. Do not use one universal word-error target across every voice agent. A noisy field-service call and a quiet desktop support call have different baselines.
Escalation and Safety SLIs
Escalation errors are expensive because they turn automation into customer frustration.
| SLI | Formula | Count as bad when |
|---|---|---|
| Required escalation recall | Required escalations completed / calls requiring escalation | The agent should transfer but does not |
| Unnecessary escalation rate | Correct non-escalations / calls not requiring escalation | The agent transfers when it should resolve |
| Context-preserved handoff | Escalations with summary and required fields / escalations | Human receives missing or wrong context |
For regulated or high-risk workflows, escalation correctness may be the most important SLO even if it has lower volume than latency or task completion.
How to Build a Voice Agent Reliability Dashboard
A good dashboard answers four questions in this order:
- Are callers currently affected?
- Which SLO is burning?
- Which flow, agent version, provider, or dependency is responsible?
- Should we keep shipping, pause changes, or start incident response?
| Dashboard row | Panels | Decision it supports |
|---|---|---|
| SLO health | Current compliance, 30-day budget remaining, forecasted budget exhaustion | Are we within reliability policy? |
| Burn-rate alerts | Fast burn, slow burn, budget consumed by flow | Is this urgent or slow drift? |
| Flow breakdown | Task completion, latency, escalation correctness by intent and route | Which customer journey is affected? |
| Pipeline attribution | ASR, LLM, TTS, tool-call, telephony, and CRM latency/error slices | Which subsystem should investigate? |
| Release overlay | Agent version, prompt version, model/provider change, config deploys | Did a recent change start the burn? |
| Review queue | Top failed calls, traces, audio snippets, regression-test candidates | What should humans inspect first? |
The voice agent dashboard template covers layout mechanics. For SLOs, add two panels that generic dashboards often miss: budget remaining and burn-rate forecast.
Budget exhaustion forecast =
remaining budget / current bad-event rate
Release gate =
block risky changes when budget remaining is low
and burn rate is above policy threshold
The dashboard should not be a compliance artifact that someone checks once a month. It should be the first page an on-call engineer opens when a production voice agent feels wrong.
Burn-Rate Alerts for Voice Agents
Burn rate measures how quickly the agent is consuming its error budget. Google Cloud's burn-rate documentation describes a burn rate above 1 as a sign that, if sustained, the service would miss its SLO for the compliance period.
Voice agents need two classes of burn alerts:
| Alert type | Scenario | Page? | Why |
|---|---|---|---|
| Fast burn | Task-completion bad-event rate is 10x budget for 15 minutes | Yes, if user impact is material | A release or provider issue may be breaking active calls. |
| Slow burn | Escalation correctness is 1.5x budget for 24 hours | Usually ticket or Slack | Quality drift needs ownership but may not need immediate paging. |
| Synthetic failure | Three consecutive critical-path synthetic calls fail | Yes during business hours | Low live traffic can hide a real outage. |
| Budget floor | Less than 20% of 30-day budget remains | No page by itself | Use as a release-risk signal. |
Fast-burn alerts should be tied to user-visible impact. Slow-burn alerts should create work, not noise. For detailed outage response mechanics, pair these alerts with the voice agent incident response runbook.
Release Policy: What Happens When the Budget Burns?
An SLO without a policy is just a chart. Before the next incident, decide what happens when budget is low.
| Budget state | Release policy | Reliability action |
|---|---|---|
| Healthy: more than 50% budget remains and burn rate is normal | Ship normally | Keep monitoring and add regression coverage for major changes |
| Watch: 20-50% budget remains or slow burn persists | Require owner review for risky changes | Investigate top budget consumers and schedule fixes |
| Freeze: under 20% budget remains and burn rate is above 1 | Pause risky prompt, model, routing, or provider changes | Focus on reliability fixes, rollback candidates, and regression tests |
| Exhausted: budget below 0 | Only ship incident fixes, security fixes, or changes that reduce burn | Run postmortem, update SLO definition if it failed to capture user pain |
This policy should not punish teams for finding reliability issues. Google's SRE error-budget policy frames budget exhaustion as permission to focus on reliability when the data says reliability matters more than feature velocity.
For voice agents, "risky change" includes more than code:
- Prompt updates
- Model/provider changes
- ASR language or acoustic model changes
- TTS voice and latency configuration
- Routing and transfer policy changes
- Tool-call schema or timeout changes
- Knowledge-base retrieval changes
Tie the release gate to the actual failing SLO. If only the Spanish billing flow is burning budget, the team may still be able to ship unrelated English FAQ improvements. If global connection success is burning, pause broadly.
Common Mistakes
Mistake 1: Using Infrastructure Uptime as the Main SLO
Server uptime is necessary, but it is not enough. A voice agent can have healthy infrastructure while users repeat themselves, hit fallback loops, or abandon calls.
Use infrastructure uptime as a dependency SLO. Use task completion, escalation correctness, and latency as customer-experience SLOs.
Mistake 2: Counting Every Failed Call Against the Agent
Not every bad call is an agent-caused reliability miss. Exclude test calls, abuse, caller hangups before the greeting, and known external outages only when the exclusion is explicit and auditable.
The exclusion policy matters because vague exclusions make the SLO easy to game. The dashboard should show both raw failures and budget-counting failures.
Mistake 3: Setting One Target Across Every Flow
Password reset, appointment booking, fraud escalation, and general FAQs should not share one task-completion target. Segment by flow and risk level.
Use the production reliability testing guide to decide which flows deserve strict release gates and which can start with observation-only SLOs.
Mistake 4: Alerting on Every Metric Instead of Budget Burn
Metric alerts create fatigue when every layer pages separately. Budget alerts compress the question: are users losing more reliability than we agreed to spend?
Keep detailed alerts for diagnosis, but make burn-rate alerts the signal that decides incident urgency.
30-Day Rollout Checklist
Use this as the first implementation pass.
| Day | Work | Output |
|---|---|---|
| 1-3 | Pick the top 3-5 user journeys by volume, revenue, or risk | Eligible-event definitions |
| 4-7 | Define good and bad events for each journey | SLI spec with exclusions |
| 8-10 | Backtest 30-60 days of production calls if available | Baseline and proposed targets |
| 11-14 | Build SLO dashboard with budget remaining and burn rate | Operator and exec views |
| 15-18 | Add synthetic calls for low-volume critical flows | Synthetic SLI coverage |
| 19-22 | Configure fast-burn and slow-burn alerts | Paging and ticket policy |
| 23-26 | Write release gates and ownership policy | Budget-state playbook |
| 27-30 | Run a review with product, engineering, and operations | Approved SLO v1 |
The first version will be imperfect. That is fine. SLOs improve when teams compare the target to real user pain, postmortems, and release decisions.

