Voice Agent Interruption Handling: Barge-In, Backchannels, and Turn Detection

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 4M+ voice agent calls to find where they break.

May 20, 2026Updated May 20, 202614 min read
Voice Agent Interruption Handling: Barge-In, Backchannels, and Turn Detection

A voice agent can be fast and still feel rude. The dashboard says P95 turn latency is healthy, but callers hear the agent cut them off mid-account-number, ignore a correction, or restart after every "uh-huh."

That is why voice agent interruption handling needs its own runbook. Barge-in is not a single setting. It is a policy that decides when caller input should stop agent audio, when it should be treated as a backchannel, when it should be ignored as noise, and what evidence should be logged so QA can replay the decision later.

If you run fewer than 50 production calls a week, keep this simple. Review interrupted calls manually, pick a conservative default, and add 5-10 regression tests. This guide is for teams with enough call volume that interruption failures hide inside aggregate latency, fallback, and completion metrics.

Voice agent interruption handling is the policy and instrumentation layer that decides what happens when a caller speaks, presses DTMF, or triggers a command while the agent is speaking. A production-ready policy records the caller input, agent speech state, interruption decision, playback action, transcript result, and recovery outcome.

Quick filter: If you cannot answer "did the caller intentionally interrupt, or did we fire on noise/backchannel?" from one call record, your interruption handling is not observable enough yet.

TL;DR: Build interruption handling as a runbook, not a toggle:

  • Classify the input: true correction, backchannel, accidental noise, DTMF, silence timeout, or safety escalation.
  • Log the lifecycle: user speech state, agent speech state, interruption candidate, decision, playback action, transcript outcome, and recovery result.
  • Test both sides: false positives cut the agent off; false negatives force callers to wait or repeat themselves.
  • Tune by workflow risk: legal disclosures, payment steps, and urgent support paths need different policies than open-ended support chat.
Methodology Note: This runbook is based on Hamming's analysis of 4M+ production voice agent calls across 10K+ voice agents (2025-2026). We've tested agents built on LiveKit, Pipecat, ElevenLabs, Retell, Vapi, and custom-built solutions.

It also uses public provider documentation from LiveKit, OpenAI, Twilio, Amazon Nova, Dialogflow CX, and Agora to ground the turn-detection and event samples.

Last Updated: May 2026

Related Guides:

What Is Voice Agent Interruption Handling?

Voice agent interruption handling answers one question: when a caller does something while the agent is speaking, should the agent stop, keep talking, pause and resume, or route the input somewhere else?

The answer changes by context. A caller saying "wait, that's the wrong address" should interrupt. A caller saying "yeah" while listening usually should not. A keypad press during an IVR-like prompt may be intentional DTMF. A loud keyboard click should not cancel TTS.

Caller Input During Agent SpeechUsually MeansDefault ActionEvidence to Keep
"No, I meant Friday"True correctionStop playback, accept new turn, preserve partial agent transcriptspeech duration, transcript, agent playback position
"uh-huh" or "okay"BackchannelContinue or briefly acknowledge without cancelling critical audioutterance text, confidence, backchannel decision
DTMF key pressMenu or confirmation actionStop or route based on prompt policydigit class, prompt state, expected menu options
Short noise or echoFalse interruptionResume playback from safe pointaudio energy, no transcript, resume decision
Long silenceNo input or hesitationReprompt, wait, or escalate depending on stepsilence duration, timeout policy, next action
"I need a human"Safety or escalation interruptionStop playback and route to handoff logicintent, transcript, escalation outcome

LiveKit's turn-detection docs split the problem into detection modes, endpointing delay, adaptive interruption handling, and VAD. OpenAI's Realtime VAD docs expose server VAD and semantic VAD settings such as threshold, prefix padding, silence duration, eagerness, and response interruption.

Those provider knobs are useful. They are not the runbook.

Working rule: Turn detection decides when the system thinks speech started or ended. Interruption handling decides what the agent does with that signal while the agent is already speaking.

Why Barge-In Fails in Production

The most common failure is treating barge-in as a boolean. Turn it on and callers can interrupt. Turn it off and they cannot.

Production is messier than that.

Failure ModeWhat the Caller FeelsRoot CauseFirst Check
False barge-inAgent keeps stopping for no reasonNoise, echo, short backchannel, overly sensitive VADaudio energy, transcript presence, false interruption events
Missed correctionCaller has to wait, repeat, or hang upInterruption disabled, threshold too strict, buffered audio droppedagent speech state, input reporting policy
Premature endpointingAgent answers before caller is doneSilence threshold too short for the workflowpause duration, partial transcript, phrase completion
Backchannel confusion"okay" becomes a new taskNo semantic/backchannel policyutterance length, words, confidence, next action
Lost recoveryAgent stops, then forgets what it already saidPlayback truncation not reflected in conversation historyheard-audio boundary, transcript truncation
No evidenceQA cannot prove what happenedMissing event taxonomy and call IDsinterruption event lifecycle

Twilio's ConversationRelay docs show why this needs precision: interruptible controls whether caller input stops TTS playback, while reportInputDuringAgentSpeech controls whether the application receives input while the agent is talking. Those are different decisions. A system can listen without stopping playback, or stop playback without preserving enough application context.

Google Dialogflow CX exposes a similar separation at a different layer: advanced speech settings include end-of-speech sensitivity, smart endpointing, no-speech timeout, barge-in, and partial response cancellation. Amazon Nova Sonic's turn-taking docs make the latency tradeoff explicit with sensitivity levels that wait roughly 1.5, 1.75, or 2.0 seconds before responding.

The practical lesson is boring but important: the best policy is not "always interrupt." It is "interrupt when the user's intent is more important than the current audio, and prove that decision in the logs."

What Events Should a Voice Agent Log for Interruptions?

If you only log the final transcript, you will miss the interruption decision. The evidence is in the timing: when the user started speaking, where the agent was in playback, what the detector decided, and whether the agent recovered.

Use this event taxonomy as the starting point.

EventRequired FieldsWhy It Matters
user.speech_startedcall ID, turn ID, timestamp, audio source, VAD confidenceShows when the interruption candidate began
user.speech_stoppedduration, transcript status, silence durationSeparates real speech from noise
agent.speech_startedresponse ID, playback start, message typeShows whether the agent was interruptible
agent.speech_interruptedplayback position, reason, heard text boundaryReconstructs what the caller actually heard
interruption.candidate_detectedmode, threshold, speech duration, words detectedExplains why the detector fired
interruption.decision_madedecision, policy version, confidence, reasonProves whether the app chose stop, continue, resume, or escalate
interruption.recoveredresume position, new user turn ID, task stateShows whether the conversation repaired cleanly
interruption.false_positivetimeout, no transcript, resume behaviorCounts noise/backchannel mistakes separately
silence.timeoutelapsed silence, prompt state, next actionHandles no-input paths without mixing them into barge-in

Twilio's Conversation Relay Insights event reference includes speech events, latency events, interaction events such as interrupt, and an interrupt payload type. Agora's turn-information API exposes turn starts, interrupted turn endings, ignored turns, silence timeouts, and latency segments. Those are useful samples of the evidence families to normalize even if your runtime uses a different provider.

Here is a normalized event envelope you can adapt:

{
  "eventName": "voice.interruption.decision_made",
  "eventVersion": "2026-05-20",
  "occurredAt": "2026-05-20T15:42:18.231Z",
  "canonicalCallId": "call_01JZ9W2M7K",
  "turnId": "turn_0007",
  "agentResponseId": "response_0006",
  "traceId": "9f7c2d4f0f3a4c1e8e4d2a5b7c6f9012",
  "agentSpeech": {
    "state": "speaking",
    "messageType": "billing_summary",
    "interruptible": true,
    "playbackPositionMs": 1840
  },
  "callerInput": {
    "type": "speech",
    "speechDurationMs": 420,
    "transcriptText": "no I meant Friday",
    "isBackchannel": false
  },
  "decision": {
    "action": "stop_agent_audio_and_accept_user_turn",
    "policyVersion": "interruption-policy-2026-05-20",
    "reason": "caller_correction_detected",
    "confidence": 0.87
  },
  "recovery": {
    "agentTranscriptTruncatedAtMs": 1840,
    "newTurnCommitted": true,
    "taskStatePreserved": true
  }
}

Keep raw transcripts and audio in the right evidence store. For broad dashboards, store pointers, policy versions, and redaction state. The IVR and voice agent log correlation runbook explains how to keep provider IDs and call context attached across the call path.

How to Choose the Right Interruption Policy

The policy should be per message type, not global. A caller should be able to correct an appointment date. They should not accidentally skip a required disclosure because they breathed loudly near the phone.

Message or Flow TypeRecommended PolicyWhy
GreetingSpeech + DTMF interruption allowed after a short grace periodCallers already know why they called
Menu promptDTMF and speech allowed, with expected option validationIVR-style flows depend on early selection
Account number or long entity capturePatient endpointing, avoid early responseCallers pause while reading numbers
Legal, consent, or payment disclosureNon-interruptible or DTMF-only until required content playsThe system may need proof that audio was delivered
Open-ended support answerAdaptive speech interruption with backchannel detectionCallers correct or narrow their request
Long tool wait messageAllow interruption and cancellationCaller may want a human or a different path
Escalation handoffAlways allow human-transfer intentSafety and customer frustration outrank current audio

This is where a voice agent's conversational policy meets reliability. If you track voice agent SLOs, interruption handling should feed at least two reliability signals: task completion after interruption and escalation correctness after interruption.

Provider settings should map to that policy rather than replace it:

Provider SurfaceUseful KnobWhat to Decide First
LiveKit Agentsturn detection mode, endpointing delay, interruption mode, false interruption resumeIs this flow realtime-model driven, STT pipeline driven, or manually controlled?
OpenAI Realtimeserver VAD vs semantic VAD, threshold, prefix padding, silence duration, eagerness, interrupt responseShould the model decide turn completion, or should the app own it?
Twilio ConversationRelayinterruptible, report input during agent speech, interrupt sensitivity, speech timeout, backchannel handlingDo you need to receive caller input without stopping TTS?
Dialogflow CXend-of-speech sensitivity, smart endpointing, no-speech timeout, barge-inWhich flows can be interrupted at agent, flow, page, or fulfillment level?
Amazon Nova Sonicendpointing sensitivityAre you optimizing for fast Q&A or patient, complex turns?
Agora Conversational AIinterrupted, ignored, silence timeout, latency segmentsDo you have post-call turn records that explain the outcome?

We used to think the right answer was mostly latency tuning: shorten the silence window, make the agent snappier, reduce dead air. That helps, but it is not enough. The hard part is distinguishing a correction from a backchannel, then preserving the state needed to recover.

How to Test Barge-In, Backchannels, and Silence Timeouts

Do not test "interruption works" as one scenario. Split false positives from false negatives.

Test CaseSetupExpected ResultFailure Signal
True correctionAgent reads a date; caller says "no, Friday" after 1 secondAgent stops, accepts correction, preserves task contextCaller repeats same correction or agent continues old path
Short backchannelCaller says "yeah" during a support explanationAgent continues or acknowledges without losing placeAgent cancels answer and treats "yeah" as new intent
Background noiseKeyboard click or side speech during agent answerAgent continues, logs no transcript or false interruptionPlayback stops without meaningful caller transcript
DTMF during promptCaller presses 2 while menu audio playsAgent routes to option 2 and logs digitDigit ignored or transcript path handles it as speech
Legal disclosureCaller speaks during non-interruptible messageAgent continues required audio, optionally buffers inputRequired message is skipped
Long account numberCaller pauses in the middle of a numberAgent waits, does not respond earlyAgent interrupts before entity is complete
Silence timeoutCaller says nothing after a questionAgent reprompts or escalates according to policyTimeout counted as user interruption or hidden in latency
Escalation interruptCaller says "human" while agent is explainingAgent stops and starts handoff pathAgent finishes explanation first

For each test, capture the same fields:

Test assertion =
  interruption decision is correct
  AND playback action is correct
  AND transcript state is correct
  AND task state is preserved
  AND recovery result is correct

The Testing LiveKit Voice Agents guide is a good companion if your runtime is LiveKit. For broader release policy, use Testing Voice Agents for Production Reliability to decide which scenarios block deployment.

How to Tune Thresholds Without Breaking Latency

Tuning interruption handling is a balancing problem. Lower thresholds make the agent feel responsive, but they create false interruptions. Higher thresholds reduce false positives, but callers feel trapped.

Start with a scorecard, not a vibe check.

MetricWhat It MeasuresWatch For
False interruption rateAgent stopped without meaningful caller inputNoise, echo, backchannel confusion
Missed interruption rateCaller tried to interrupt but agent kept speakingThreshold too high, reporting disabled, non-interruptible segment too broad
Resume success rateAgent resumes cleanly after false interruptionBroken playback state or transcript truncation
Repeated user speech rateCaller repeats the same correctionMissed interruption or poor recovery
Silence after interruptionDead air after agent stopsState machine did not commit next action
Task completion after interruptionOutcome quality for interrupted callsRecovery path is worse than uninterrupted path
Escalation after interruptionHandoff rate after interruptionUser frustration or correct safety routing

Then tune one thing at a time:

  1. Pick one workflow, such as appointment rescheduling or billing lookup.
  2. Freeze a test set with true corrections, backchannels, noise, long entities, and silence timeouts.
  3. Change one setting: threshold, silence duration, endpointing sensitivity, backchannel policy, or non-interruptible segment.
  4. Run the same test set and compare false positives against missed interruptions.
  5. Review the top 20 production interrupted calls after release.

For analytics, connect these signals to the voice agent metrics dictionary and voice agent dashboard template. For root cause analysis, the voice agent observability tracing guide and OpenTelemetry guide show where to attach stage timings and trace IDs.

Tuning rule: optimize for the caller-visible mistake, not the provider knob. A 300 ms silence change is good only if it reduces bad outcomes without increasing false interruptions in the flows that matter.

Rollout Checklist

Before shipping a new interruption policy, make the release owner answer these questions:

  • Which message types are interruptible, DTMF-only, or non-interruptible?
  • Which detector owns turn completion: VAD, STT endpointing, realtime model, manual control, or a contextual turn detector?
  • Are true interruptions, backchannels, noise, long entities, and silence timeouts covered in tests?
  • Does the call record show user speech state, agent speech state, decision, playback action, transcript outcome, and recovery result?
  • Can QA replay the specific interrupted turn with audio, transcript, and trace pointers?
  • Are raw transcripts and audio protected with the same redaction rules as other call evidence?
  • Do dashboards separate false interruptions from missed interruptions?
  • Is task completion after interruption tracked separately from overall task completion?
  • Did the owner review production interrupted calls after the change?
  • Is there a rollback plan if false interruptions spike?

One unresolved tension is that the best policy is sometimes less "natural" than a demo. A demo agent that interrupts instantly feels impressive for 3 minutes. A production healthcare, finance, or support agent has to be patient enough for real callers, noisy rooms, account numbers, accents, DTMF, and compliance language.

That is the bar. Do not ship the demo behavior. Ship the behavior you can test, explain, and replay.

Frequently Asked Questions

Voice agent interruption handling is the policy that decides what happens when a caller speaks, presses a key, or triggers a command while the agent is talking. Hamming recommends tracking it as a full event lifecycle, not a single barge-in toggle, because the right behavior depends on whether the input is a correction, backchannel, noise, DTMF, or safety escalation.

Barge-in means the caller can interrupt the agent's audio playback before the agent finishes speaking. In production, Hamming treats barge-in as one class of interruption and tests at least 5 scenarios: true correction, short backchannel, background noise, DTMF input, and silence timeout recovery.

Log user speech start and stop, agent speech start and stop, interruption candidate, interruption decision, playback action, transcript result, and recovery outcome. Hamming recommends storing these 8 event families with a stable call ID and turn ID so QA can replay the specific moment that changed the conversation.

Test backchannel detection with short acknowledgments such as yes, okay, uh-huh, and mm-hmm while the agent is speaking. Hamming recommends verifying that these utterances do not cancel critical audio unless the workflow explicitly treats them as confirmation, and then repeating the same test with real corrections that should interrupt.

Watch false interruption rate, missed interruption rate, resume success rate, repeated-user-speech rate, silence-after-interruption, and task completion after interruption. Hamming recommends reviewing the top 20 interrupted calls after every major prompt, model, voice, or turn-detection change because aggregate latency can stay green while callers are being cut off.

Tune turn detection by workflow risk: increase patience for long account numbers, medical descriptions, and elderly callers, then use faster endpointing for short command-and-control flows. Hamming recommends changing one knob at a time, running a fixed interruption test suite, and comparing false positives against missed interruptions before shipping.

No. Payment instructions, legal disclosures, emergency disclaimers, and consent prompts may need non-interruptible or DTMF-only segments, while general support conversations usually need speech interruption. Hamming recommends documenting the policy per message type rather than applying one global rule across every turn.

Hamming helps teams test and monitor interruption behavior across production-like calls, including barge-in, latency, task completion, and recovery paths. Teams can use Hamming to turn repeated interruption failures into regression tests and track whether a change improves the caller experience across 4M+ production voice agent calls and 10K+ voice agents.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizento build the future of trustworthy, safe and reliable voice AI agents.”