What is barge-in for a voice agent?

Barge-in means the caller can interrupt the agent's audio playback before the agent finishes speaking. In production, Hamming treats barge-in as one class of interruption and tests at least 5 scenarios: true correction, short backchannel, background noise, DTMF input, and silence timeout recovery.

Which events should I log for voice agent interruptions?

Log user speech start and stop, agent speech start and stop, interruption candidate, interruption decision, playback action, transcript result, and recovery outcome. Hamming recommends storing these 8 event families with a stable call ID and turn ID so QA can replay the specific moment that changed the conversation.

How do I test voice agent backchannel detection?

Test backchannel detection with short acknowledgments such as yes, okay, uh-huh, and mm-hmm while the agent is speaking. Hamming recommends verifying that these utterances do not cancel critical audio unless the workflow explicitly treats them as confirmation, and then repeating the same test with real corrections that should interrupt.

What metrics show that barge-in is broken?

Watch false interruption rate, missed interruption rate, resume success rate, repeated-user-speech rate, silence-after-interruption, and task completion after interruption. Hamming recommends reviewing the top 20 interrupted calls after every major prompt, model, voice, or turn-detection change because aggregate latency can stay green while callers are being cut off.

How should I tune voice agent turn detection?

Tune turn detection by workflow risk: increase patience for long account numbers, medical descriptions, and elderly callers, then use faster endpointing for short command-and-control flows. Hamming recommends changing one knob at a time, running a fixed interruption test suite, and comparing false positives against missed interruptions before shipping.

Should every voice agent allow interruption?

No. Payment instructions, legal disclosures, emergency disclaimers, and consent prompts may need non-interruptible or DTMF-only segments, while general support conversations usually need speech interruption. Hamming recommends documenting the policy per message type rather than applying one global rule across every turn.

How does Hamming help with voice agent interruption handling?

Hamming helps teams test and monitor interruption behavior across production-like calls, including barge-in, latency, task completion, and recovery paths. Teams can use Hamming to turn repeated interruption failures into regression tests and track whether a change improves the caller experience across 10M+ mins protected and 10K+ voice agents.

Voice Agent Interruption Handling: Barge-In, Backchannels, and Turn Detection

Q: What is voice agent interruption handling?

Voice agent interruption handling is the policy that decides what happens when a caller speaks, presses a key, or triggers a command while the agent is talking. Hamming recommends tracking it as a full event lifecycle, not a single barge-in toggle, because the right behavior depends on whether the input is a correction, backchannel, noise, DTMF, or safety escalation.

A voice agent can be fast and still feel rude. The dashboard says P95 turn latency is healthy, but callers hear the agent cut them off mid-account-number, ignore a correction, or restart after every "uh-huh."

That is why voice agent interruption handling needs its own runbook. Barge-in is not a single setting. It is a policy that decides when caller input should stop agent audio, when it should be treated as a backchannel, when it should be ignored as noise, and what evidence should be logged so QA can replay the decision later.

If you run fewer than 50 production calls a week, keep this simple. Review interrupted calls manually, pick a conservative default, and add 5-10 regression tests. This guide is for teams with enough call volume that interruption failures hide inside aggregate latency, fallback, and completion metrics.

Voice agent interruption handling is the policy and instrumentation layer that decides what happens when a caller speaks, presses DTMF, or triggers a command while the agent is speaking. A production-ready policy records the caller input, agent speech state, interruption decision, playback action, transcript result, and recovery outcome.

Quick filter: If you cannot answer "did the caller intentionally interrupt, or did we fire on noise/backchannel?" from one call record, your interruption handling is not observable enough yet.

TL;DR: Build interruption handling as a runbook, not a toggle:

Classify the input: true correction, backchannel, accidental noise, DTMF, silence timeout, or safety escalation.

Log the lifecycle: user speech state, agent speech state, interruption candidate, decision, playback action, transcript outcome, and recovery result.

Test both sides: false positives cut the agent off; false negatives force callers to wait or repeat themselves.

Tune by workflow risk: legal disclosures, payment steps, and urgent support paths need different policies than open-ended support chat.

Methodology Note: This runbook is based on Hamming's analysis of production voice agent calls across 10K+ voice agents (2025-2026). Hamming's platform has 10M+ mins protected. We've tested agents built on LiveKit, Pipecat, ElevenLabs, Retell, Vapi, and custom-built solutions.
It also uses public provider documentation from LiveKit, OpenAI, Twilio, Amazon Nova, Dialogflow CX, and Agora to ground the turn-detection and event samples.

Last Updated: May 2026

Related Guides:

Voice AI Latency: What's Fast, What's Slow, and How to Fix It - latency thresholds that interact with turn-taking
Voice Agent Analytics and Post-Call Metrics - formulas for interruption rate, containment, and task completion
Voice Agent Observability Tracing - trace the ASR, LLM, tool, and TTS path around an interrupted turn
OpenTelemetry for AI Voice Agents - span and event modeling for voice pipelines
IVR and Voice Agent Log Correlation - preserve call IDs across IVR, telephony, and agent sessions
Debugging Voice Agents - investigate missed intents and fallback spikes
Testing LiveKit Voice Agents - platform-specific test setup for LiveKit agents
Voice Agent SLOs and Error Budgets - turn interruption failures into reliability targets

What Is Voice Agent Interruption Handling?

Voice agent interruption handling answers one question: when a caller does something while the agent is speaking, should the agent stop, keep talking, pause and resume, or route the input somewhere else?

The answer changes by context. A caller saying "wait, that's the wrong address" should interrupt. A caller saying "yeah" while listening usually should not. A keypad press during an IVR-like prompt may be intentional DTMF. A loud keyboard click should not cancel TTS.

Caller Input During Agent Speech	Usually Means	Default Action	Evidence to Keep
"No, I meant Friday"	True correction	Stop playback, accept new turn, preserve partial agent transcript	speech duration, transcript, agent playback position
"uh-huh" or "okay"	Backchannel	Continue or briefly acknowledge without cancelling critical audio	utterance text, confidence, backchannel decision
DTMF key press	Menu or confirmation action	Stop or route based on prompt policy	digit class, prompt state, expected menu options
Short noise or echo	False interruption	Resume playback from safe point	audio energy, no transcript, resume decision
Long silence	No input or hesitation	Reprompt, wait, or escalate depending on step	silence duration, timeout policy, next action
"I need a human"	Safety or escalation interruption	Stop playback and route to handoff logic	intent, transcript, escalation outcome

LiveKit's turn-detection docs split the problem into detection modes, endpointing delay, adaptive interruption handling, and VAD. OpenAI's Realtime VAD docs expose server VAD and semantic VAD settings such as threshold, prefix padding, silence duration, eagerness, and response interruption.

Those provider knobs are useful. They are not the runbook.

Working rule: Turn detection decides when the system thinks speech started or ended. Interruption handling decides what the agent does with that signal while the agent is already speaking.

Why Barge-In Fails in Production

The most common failure is treating barge-in as a boolean. Turn it on and callers can interrupt. Turn it off and they cannot.

Production is messier than that.

Failure Mode	What the Caller Feels	Root Cause	First Check
False barge-in	Agent keeps stopping for no reason	Noise, echo, short backchannel, overly sensitive VAD	audio energy, transcript presence, false interruption events
Missed correction	Caller has to wait, repeat, or hang up	Interruption disabled, threshold too strict, buffered audio dropped	agent speech state, input reporting policy
Premature endpointing	Agent answers before caller is done	Silence threshold too short for the workflow	pause duration, partial transcript, phrase completion
Backchannel confusion	"okay" becomes a new task	No semantic/backchannel policy	utterance length, words, confidence, next action
Lost recovery	Agent stops, then forgets what it already said	Playback truncation not reflected in conversation history	heard-audio boundary, transcript truncation
No evidence	QA cannot prove what happened	Missing event taxonomy and call IDs	interruption event lifecycle

Twilio's ConversationRelay docs show why this needs precision: interruptible controls whether caller input stops TTS playback, while reportInputDuringAgentSpeech controls whether the application receives input while the agent is talking. Those are different decisions. A system can listen without stopping playback, or stop playback without preserving enough application context.

Google Dialogflow CX exposes a similar separation at a different layer: advanced speech settings include end-of-speech sensitivity, smart endpointing, no-speech timeout, barge-in, and partial response cancellation. Amazon Nova Sonic's turn-taking docs make the latency tradeoff explicit with sensitivity levels that wait roughly 1.5, 1.75, or 2.0 seconds before responding.

The practical lesson is boring but important: the best policy is not "always interrupt." It is "interrupt when the user's intent is more important than the current audio, and prove that decision in the logs."

What Events Should a Voice Agent Log for Interruptions?

If you only log the final transcript, you will miss the interruption decision. The evidence is in the timing: when the user started speaking, where the agent was in playback, what the detector decided, and whether the agent recovered.

Use this event taxonomy as the starting point.

Event	Required Fields	Why It Matters
`user.speech_started`	call ID, turn ID, timestamp, audio source, VAD confidence	Shows when the interruption candidate began
`user.speech_stopped`	duration, transcript status, silence duration	Separates real speech from noise
`agent.speech_started`	response ID, playback start, message type	Shows whether the agent was interruptible
`agent.speech_interrupted`	playback position, reason, heard text boundary	Reconstructs what the caller actually heard
`interruption.candidate_detected`	mode, threshold, speech duration, words detected	Explains why the detector fired
`interruption.decision_made`	decision, policy version, confidence, reason	Proves whether the app chose stop, continue, resume, or escalate
`interruption.recovered`	resume position, new user turn ID, task state	Shows whether the conversation repaired cleanly
`interruption.false_positive`	timeout, no transcript, resume behavior	Counts noise/backchannel mistakes separately
`silence.timeout`	elapsed silence, prompt state, next action	Handles no-input paths without mixing them into barge-in

Twilio's Conversation Relay Insights event reference includes speech events, latency events, interaction events such as interrupt, and an interrupt payload type. Agora's turn-information API exposes turn starts, interrupted turn endings, ignored turns, silence timeouts, and latency segments. Those are useful samples of the evidence families to normalize even if your runtime uses a different provider.

Here is a normalized event envelope you can adapt:

{  "eventName": "voice.interruption.decision_made",  "eventVersion": "2026-05-20",  "occurredAt": "2026-05-20T15:42:18.231Z",  "canonicalCallId": "call_01JZ9W2M7K",  "turnId": "turn_0007",  "agentResponseId": "response_0006",  "traceId": "9f7c2d4f0f3a4c1e8e4d2a5b7c6f9012",  "agentSpeech": {    "state": "speaking",    "messageType": "billing_summary",    "interruptible": true,    "playbackPositionMs": 1840  },  "callerInput": {    "type": "speech",    "speechDurationMs": 420,    "transcriptText": "no I meant Friday",    "isBackchannel": false  },  "decision": {    "action": "stop_agent_audio_and_accept_user_turn",    "policyVersion": "interruption-policy-2026-05-20",    "reason": "caller_correction_detected",    "confidence": 0.87  },  "recovery": {    "agentTranscriptTruncatedAtMs": 1840,    "newTurnCommitted": true,    "taskStatePreserved": true  }}

Keep raw transcripts and audio in the right evidence store. For broad dashboards, store pointers, policy versions, and redaction state. The IVR and voice agent log correlation runbook explains how to keep provider IDs and call context attached across the call path.

How to Choose the Right Interruption Policy

The policy should be per message type, not global. A caller should be able to correct an appointment date. They should not accidentally skip a required disclosure because they breathed loudly near the phone.

Message or Flow Type	Recommended Policy	Why
Greeting	Speech + DTMF interruption allowed after a short grace period	Callers already know why they called
Menu prompt	DTMF and speech allowed, with expected option validation	IVR-style flows depend on early selection
Account number or long entity capture	Patient endpointing, avoid early response	Callers pause while reading numbers
Legal, consent, or payment disclosure	Non-interruptible or DTMF-only until required content plays	The system may need proof that audio was delivered
Open-ended support answer	Adaptive speech interruption with backchannel detection	Callers correct or narrow their request
Long tool wait message	Allow interruption and cancellation	Caller may want a human or a different path
Escalation handoff	Always allow human-transfer intent	Safety and customer frustration outrank current audio

This is where a voice agent's conversational policy meets reliability. If you track voice agent SLOs, interruption handling should feed at least two reliability signals: task completion after interruption and escalation correctness after interruption.

Provider settings should map to that policy rather than replace it:

Provider Surface	Useful Knob	What to Decide First
LiveKit Agents	turn detection mode, endpointing delay, interruption mode, false interruption resume	Is this flow realtime-model driven, STT pipeline driven, or manually controlled?
OpenAI Realtime	server VAD vs semantic VAD, threshold, prefix padding, silence duration, eagerness, interrupt response	Should the model decide turn completion, or should the app own it?
Twilio ConversationRelay	interruptible, report input during agent speech, interrupt sensitivity, speech timeout, backchannel handling	Do you need to receive caller input without stopping TTS?
Dialogflow CX	end-of-speech sensitivity, smart endpointing, no-speech timeout, barge-in	Which flows can be interrupted at agent, flow, page, or fulfillment level?
Amazon Nova Sonic	endpointing sensitivity	Are you optimizing for fast Q&A or patient, complex turns?
Agora Conversational AI	interrupted, ignored, silence timeout, latency segments	Do you have post-call turn records that explain the outcome?

We used to think the right answer was mostly latency tuning: shorten the silence window, make the agent snappier, reduce dead air. That helps, but it is not enough. The hard part is distinguishing a correction from a backchannel, then preserving the state needed to recover.

How to Test Barge-In, Backchannels, and Silence Timeouts

Do not test "interruption works" as one scenario. Split false positives from false negatives.

Test Case	Setup	Expected Result	Failure Signal
True correction	Agent reads a date; caller says "no, Friday" after 1 second	Agent stops, accepts correction, preserves task context	Caller repeats same correction or agent continues old path
Short backchannel	Caller says "yeah" during a support explanation	Agent continues or acknowledges without losing place	Agent cancels answer and treats "yeah" as new intent
Background noise	Keyboard click or side speech during agent answer	Agent continues, logs no transcript or false interruption	Playback stops without meaningful caller transcript
DTMF during prompt	Caller presses 2 while menu audio plays	Agent routes to option 2 and logs digit	Digit ignored or transcript path handles it as speech
Legal disclosure	Caller speaks during non-interruptible message	Agent continues required audio, optionally buffers input	Required message is skipped
Long account number	Caller pauses in the middle of a number	Agent waits, does not respond early	Agent interrupts before entity is complete
Silence timeout	Caller says nothing after a question	Agent reprompts or escalates according to policy	Timeout counted as user interruption or hidden in latency
Escalation interrupt	Caller says "human" while agent is explaining	Agent stops and starts handoff path	Agent finishes explanation first

For each test, capture the same fields:

Test guardrail =  interruption decision is correct  AND playback action is correct  AND transcript state is correct  AND task state is preserved  AND recovery result is correct

The Testing LiveKit Voice Agents guide is a good companion if your runtime is LiveKit. For broader release policy, use Testing Voice Agents for Production Reliability to decide which scenarios block deployment.

How to Tune Thresholds Without Breaking Latency

Tuning interruption handling is a balancing problem. Lower thresholds make the agent feel responsive, but they create false interruptions. Higher thresholds reduce false positives, but callers feel trapped.

Start with a scorecard, not a vibe check.

Metric	What It Measures	Watch For
False interruption rate	Agent stopped without meaningful caller input	Noise, echo, backchannel confusion
Missed interruption rate	Caller tried to interrupt but agent kept speaking	Threshold too high, reporting disabled, non-interruptible segment too broad
Resume success rate	Agent resumes cleanly after false interruption	Broken playback state or transcript truncation
Repeated user speech rate	Caller repeats the same correction	Missed interruption or poor recovery
Silence after interruption	Dead air after agent stops	State machine did not commit next action
Task completion after interruption	Outcome quality for interrupted calls	Recovery path is worse than uninterrupted path
Escalation after interruption	Handoff rate after interruption	User frustration or correct safety routing

Then tune one thing at a time:

Pick one workflow, such as appointment rescheduling or billing lookup.
Freeze a test set with true corrections, backchannels, noise, long entities, and silence timeouts.
Change one setting: threshold, silence duration, endpointing sensitivity, backchannel policy, or non-interruptible segment.
Run the same test set and compare false positives against missed interruptions.
Review the top 20 production interrupted calls after release.

For analytics, connect these signals to the voice agent metrics dictionary and voice agent dashboard template. For root cause analysis, the voice agent observability tracing guide and OpenTelemetry guide show where to attach stage timings and trace IDs.

Tuning rule: optimize for the caller-visible mistake, not the provider knob. A 300 ms silence change is good only if it reduces bad outcomes without increasing false interruptions in the flows that matter.

Rollout Checklist

Before shipping a new interruption policy, make the release owner answer these questions:

One unresolved tension is that the best policy is sometimes less "natural" than a demo. A demo agent that interrupts instantly feels impressive for 3 minutes. A production healthcare, finance, or support agent has to be patient enough for real callers, noisy rooms, account numbers, accents, DTMF, and compliance language.

That is the bar. Do not ship the demo behavior. Ship the behavior you can test, explain, and replay.

Voice Agent Interruption Handling: Barge-In, Backchannels, and Turn Detection

What Is Voice Agent Interruption Handling?

Why Barge-In Fails in Production

What Events Should a Voice Agent Log for Interruptions?

How to Choose the Right Interruption Policy

How to Test Barge-In, Backchannels, and Silence Timeouts

How to Tune Thresholds Without Breaking Latency

Rollout Checklist

Frequently Asked Questions

Sumanyu Sharma

Related Resources

Voice Agent Hallucination Detection Guide

WebRTC Call Quality Testing for Voice Agents

Long-Call Voice Agent Testing: How to Test 70+ Conversation Turns