Why does my WebRTC voice agent work locally but fail in production?

Production networks have NAT, firewalls, and restrictive UDP policies that don't exist in local development. The most common issue is ICE negotiation failing due to blocked STUN/TURN servers. Always test with restricted networks, verify TURN server connectivity, and check chrome://webrtc-internals for ICE state transitions. Based on Hamming's data, 70% of production failures are network-layer issues, not AI pipeline problems.

What's the quickest way to diagnose WebRTC connection failures?

Open chrome://webrtc-internals BEFORE starting your session, then check three things in order: (1) ICE connection state - must reach 'connected', (2) candidate pairs - verify srflx or relay candidates exist, (3) RTP stats - confirm packets are being sent/received. If ICE never connects, it's a network issue. If packets aren't flowing, it's a media configuration issue. Only after these pass should you debug the AI pipeline.

How much latency is actually acceptable for voice agents in production?

Based on Hamming's analysis of 4M+ production calls, P50 end-to-end latency is ~1.5 seconds and P95 is ~5 seconds. This is significantly higher than theoretical targets but still provides acceptable user experience. The breakdown is typically: audio capture (70ms) + STT (350ms) + LLM (600-1000ms) + TTS (100ms) + network hops (100ms). Focus on keeping P95 under 5 seconds rather than chasing sub-second P50.

Why do I get one-way audio in WebRTC voice calls?

One-way audio means ICE connected but RTP packets flow in only one direction. Check webrtc-internals for packet counts: if outbound-rtp.packetsSent > 0 but inbound-rtp.packetsReceived = 0, the remote side isn't sending or packets are blocked by NAT/firewall. This is almost always a TURN server issue - verify both sides can reach the TURN server and credentials are valid.

How can I debug packet loss and choppy audio in WebRTC?

Monitor three key metrics in webrtc-internals: packetsLost (target 5% causes severe degradation. Common causes: network congestion, insufficient bandwidth, or aggressive firewall throttling. Enable FEC (Forward Error Correction) in Opus codec, increase jitter buffer size, and consider reducing audio bitrate if bandwidth is constrained.

What causes ICE gathering to hang in 'gathering' state?

ICE stuck in 'gathering' means the browser can't reach STUN/TURN servers. This happens when: (1) STUN server URLs are incorrect or servers are down, (2) Corporate firewall blocks UDP port 3478 for STUN, (3) TURN credentials expired or are invalid. Test STUN connectivity with 'nc -u stun.l.google.com 19302' and always configure TURN as fallback with proper credentials.

How do I implement reliable barge-in for voice agents?

Barge-in requires coordinating VAD, echo cancellation, and pipeline state. When user speech is detected during TTS playback: (1) immediately stop TTS audio, (2) clear audio buffers to prevent overlap, (3) mark context as 'interrupted' for the LLM, (4) increase VAD sensitivity temporarily. Production data shows barge-in detection latency P50 ~200ms and agent stop latency P50 ~300ms are acceptable.

Why does my voice agent have high latency with LiveKit or Daily?

Framework-specific latency often comes from suboptimal configuration. For LiveKit: ensure agent and server are in the same region, use auto-subscribe for tracks, and implement proper async handling. For Daily/Pipecat: minimize frame processing overhead, use streaming STT/TTS, and avoid blocking operations in the pipeline. Profile with framework-specific tools (Whisker for Pipecat, LiveKit's built-in metrics).

What's the difference between srflx, relay, and host candidates in ICE?

Host candidates are local IP addresses (only work on same network). Srflx (server reflexive) candidates are your public IP from STUN, work for most direct connections. Relay candidates route through TURN servers, adding ~30ms latency but working through any NAT/firewall. In production, 40% of calls require relay candidates. Always configure TURN servers as fallback.

How can I test WebRTC voice agents at scale without manual testing?

Use automated WebRTC testing platforms like Hamming that simulate real network conditions, generate test scenarios from prompts, and measure 50+ quality metrics. Key capabilities needed: LiveKit-to-LiveKit or Daily-to-Daily connections, network condition simulation (packet loss, jitter), automated speech generation, and latency breakdown by component. Manual chrome://webrtc-internals debugging doesn't scale beyond 10 test cases.

Debug WebRTC Voice Agents: Complete Checklist & Troubleshooting Guide

TL;DR: Debug WebRTC voice agents using this 3-layer diagnostic approach:

Layer	What to Check	Key Tools	Production Thresholds (Hamming data)
Network	ICE connection, STUN/TURN, firewall	chrome://webrtc-internals, Wireshark	ICE state: "connected", RTT <150ms
Media	RTP packet loss, jitter, audio quality	webrtc-internals stats, getStats API	Packet loss <1%, jitter <20ms
Pipeline	STT/LLM/TTS latency, barge-in, turn detection	Component traces, Whisker/Tail	End-to-end P50 ~1.5s, P95 ~5s, barge-in <500ms

Start at the Network layer. If ICE never reaches "connected", nothing else matters. If RTP stats show packet loss >5%, audio quality degrades before it reaches your AI pipeline. Only debug STT/LLM/TTS after verifying network and media are healthy.

Related Guides:

Voice Agent Troubleshooting Guide — Complete diagnostic checklist for ASR, LLM, TTS, and tool failures
Voice Agent Incident Response Runbook — 4-Stack framework for production outages
Voice Agent Observability & Tracing — OpenTelemetry tracing for voice pipelines
How to Evaluate and Test Voice Agents — 4-Layer QA Framework
Voice Agent Evaluation Metrics Guide — Metrics library with formulas and benchmarks

Quick Symptom Lookup

Jump to the section that matches your issue:

Symptom	Likely Cause	Go To Section
ICE state stuck on "checking"	Firewall blocking UDP	ICE Connection Failures
No audio either direction	Media connection failed	STUN/TURN Configuration
One-way audio	Asymmetric NAT/firewall	One-Way Audio Diagnosis
Choppy/robotic voice	Packet loss or jitter	RTP Media Quality
2-5 second response delay	STT endpointing or LLM queuing	Pipeline Latency Breakdown
Agent doesn't stop when interrupted	Barge-in detection issue	Barge-In Handling
Agent cuts off user mid-sentence	Aggressive endpointing	Turn Detection Issues

WebRTC Debugging Fundamentals

Voice agents built on WebRTC require debugging across multiple layers that traditional application monitoring doesn't cover: network traversal (ICE/STUN/TURN), media transport (RTP/jitter/packet loss), and the AI pipeline (STT/LLM/TTS latency). Engineers face limited visibility compared to traditional systems—no standard observability for interruption patterns, ASR drift, or orchestration delays.

This guide provides symptom-to-cause diagnostics, structured logging patterns, and framework-specific debugging for LiveKit and Pipecat implementations.

Browser-Based Debugging with chrome://webrtc-internals

chrome://webrtc-internals is the most comprehensive tool for debugging WebRTC connections in real-time.

How to use it:

Open chrome://webrtc-internals in Chrome (v87+) before starting your voice agent session
Start the voice agent call—connection data will populate automatically
Look for the peer connection entry and expand it to see:
- ICE candidate pairs and connection state
- Inbound/outbound RTP statistics (packet loss, jitter, RTT)
- Audio/video track statistics

Critical: Open webrtc-internals before the call starts. Connection establishment data (ICE gathering, candidate exchange) is only captured if the tab is open before the session begins.

Key sections to check:

Section	What It Shows	What to Look For
ICE Candidate Pairs	Connection attempts	Selected pair should show "succeeded"
Inbound RTP (audio)	Incoming audio stats	packetsLost, jitter, roundTripTime
Outbound RTP (audio)	Outgoing audio stats	packetsSent, bytesSent
Connection State	ICE state transitions	Should reach "connected" or "completed"

Firefox alternative: Use about:webrtc for similar functionality, though with fewer features than Chrome.

Production monitoring: chrome://webrtc-internals is impractical outside development. For production, implement client-side event tracing using the getStats() API to collect RTP statistics programmatically.

Understanding WebRTC Connection Architecture

WebRTC uses the Interactive Connectivity Establishment (ICE) framework to establish peer connections through firewalls:

┌─────────────────────────────────────────────────────────────────┐
│                    ICE Connection Flow                          │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  Client A                                        Client B       │
│     │                                               │           │
│     │  1. Gather ICE candidates                     │           │
│     │     (host, srflx, relay)                      │           │
│     │                                               │           │
│     │  2. Exchange candidates via signaling         │           │
│     │ ─────────────────────────────────────────────>│           │
│     │<─────────────────────────────────────────────│           │
│     │                                               │           │
│     │  3. Connectivity checks (STUN binding)        │           │
│     │<─────────────────────────────────────────────>│           │
│     │                                               │           │
│     │  4. Select best candidate pair                │           │
│     │═══════════════════════════════════════════════│           │
│     │         Media flows directly or via TURN      │           │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

ICE candidate types:

Candidate Type	How It's Obtained	When It Works
host	Local network interface	Same network only
srflx (server reflexive)	STUN server response	~80% of NAT configurations
relay	TURN server allocation	Always works (adds latency)

STUN (Session Traversal Utilities for NAT): Discovers your public IP address and NAT type. Works for most home/office NATs but fails with Symmetric NAT.

TURN (Traversal Using Relays around NAT): Relays all media through the server. Required when Symmetric NAT blocks direct connections (~20% of cases). Adds 20-50ms latency overhead.

SDP Negotiation and Signaling

Session Description Protocol (SDP) describes the multimedia session: codecs, formats, encryption parameters.

Offer/Answer flow:

Caller creates SDP offer with supported codecs
Offer sent via signaling channel (WebSocket, HTTP)
Callee creates SDP answer, accepting/rejecting codecs
Answer sent back via signaling
Both sides set local/remote descriptions
ICE gathering begins after local description is set

Common SDP negotiation failures:

Error	Cause	Fix
`InvalidStateError`	Setting description in wrong state	Check `signalingState` before calling `setLocalDescription`/`setRemoteDescription`
Codec mismatch	Answerer doesn't support offered codecs	Ensure both sides support at least one common codec (Opus recommended for audio)
ICE gathering never starts	Local description not set	Call `setLocalDescription()` with the offer/answer

Signaling state transitions:

stable → have-local-offer → have-remote-answer → stable (caller)
stable → have-remote-offer → have-local-answer → stable (callee)

Monitor signalingState changes—unexpected states indicate negotiation problems.

Network and Connectivity Troubleshooting

ICE Connection Failure Diagnosis

Symptom: ICE connection state stuck on "checking" or transitions to "failed"

Diagnostic steps:

Check ICE candidate gathering in webrtc-internals:
- Look for iceGatheringState → should reach "complete"
- Check iceCandidates array for all three types (host, srflx, relay)
- Missing srflx candidates = STUN server unreachable
- Missing relay candidates = TURN server unreachable or bad credentials

Check ICE connection state transitions:

new → checking → connected → completed  ← Success
new → checking → failed                  ← Failure

Examine candidate pairs:
- Look for pairs with state "succeeded"
- If all pairs show "failed", connectivity is blocked

ICE failure causes and fixes:

ICE State	Likely Cause	Diagnostic	Fix
Stuck on "gathering"	STUN/TURN unreachable	Check network connectivity to STUN/TURN servers	Verify server URLs, check firewall
Stuck on "checking"	All candidates blocked	Check if any candidate pair attempts	Open UDP ports 3478, 5349, 10000-60000
Transitions to "failed"	No successful connectivity check	Check for STUN binding failures	Use TURN as fallback, check credentials
Reaches "connected" then "failed"	Connection dropped	Check network stability	Implement ICE restart on failure

STUN/TURN Configuration Verification

Verify STUN server connectivity:

// Test STUN binding request
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

pc.onicecandidate = (e) => {
  if (e.candidate) {
    console.log('Candidate type:', e.candidate.type);
    // 'srflx' = STUN working
    // 'host' only = STUN not working
  }
};

pc.createDataChannel('test');
pc.createOffer().then(offer => pc.setLocalDescription(offer));

Verify TURN server connectivity:

const pc = new RTCPeerConnection({
  iceServers: [{
    urls: 'turn:your-turn-server.com:3478',
    username: 'your-username',
    credential: 'your-credential'
  }],
  iceTransportPolicy: 'relay' // Force TURN only
});

pc.onicecandidate = (e) => {
  if (e.candidate && e.candidate.type === 'relay') {
    console.log('TURN working: relay candidate gathered');
  }
};

TURN allocation failure causes:

Error	Cause	Fix
401 Unauthorized	Invalid credentials	Check username/credential, verify not expired
403 Forbidden	Credential expired	Refresh TURN credentials (typically expire in 24h)
Connection timeout	Server unreachable	Check firewall, try TCP fallback (port 443)
No relay candidates	TURN disabled or misconfigured	Verify `iceServers` configuration

Wireshark debugging:

# Capture STUN/TURN traffic
wireshark -f "udp port 3478 or udp port 5349 or tcp port 443"

Look for:

STUN Binding Request → Binding Response (success)
TURN Allocate Request → Allocate Response (success)
TURN CreatePermission → CreatePermission Response

Firewall and NAT Traversal Problems

Symptom: Connection works on some networks but fails on others

Corporate firewall diagnosis:

Behavior	Likely Block	Workaround
No srflx candidates	UDP 3478 blocked	Use TURN over TCP/443
No relay candidates	TURN ports blocked	Use TURN over TCP/443 (TLS)
ICE fails after gathering	Outbound UDP blocked	Configure TURN with `transport=tcp`

NAT type diagnosis:

NAT Type	Direct Connection	STUN Works	TURN Required
Full Cone	Yes	Yes	No
Restricted Cone	Sometimes	Yes	Sometimes
Port Restricted	Sometimes	Yes	Sometimes
Symmetric	No	No	Yes

Symmetric NAT detection: If srflx candidates gather but ICE connectivity checks fail, suspect Symmetric NAT. Force TURN relay:

const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:your-server.com:443?transport=tcp', username: '...', credential: '...' }
  ]
});

RTP Media Quality Analysis

Understanding RTP Statistics

Once ICE connection succeeds, audio flows via RTP. Use getStats() API to monitor quality:

// Collect RTP statistics every 2 seconds
setInterval(async () => {
  const stats = await peerConnection.getStats();
  stats.forEach(report => {
    if (report.type === 'inbound-rtp' && report.kind === 'audio') {
      console.log('Packets received:', report.packetsReceived);
      console.log('Packets lost:', report.packetsLost);
      console.log('Jitter (ms):', report.jitter * 1000);
    }
    if (report.type === 'remote-inbound-rtp' && report.kind === 'audio') {
      console.log('Round trip time (ms):', report.roundTripTime * 1000);
    }
  });
}, 2000);

Key RTP metrics from webrtc-internals:

Metric	Location in webrtc-internals	Calculation
Packet Loss	`packetsLost` / (`packetsReceived` + `packetsLost`)	Percentage of lost packets
Jitter	`jitter` field (in seconds)	Variation in packet arrival times
RTT	`roundTripTime` or `currentRoundTripTime`	Network latency (round trip)
Bitrate	Δ`bytesSent` / Δtime	Throughput in kbps

Voice Quality Thresholds

Production thresholds for voice agents (based on Hamming's analysis of 4M+ calls):

Metric	Excellent	Good (P50)	Acceptable (P95)	Poor
End-to-end latency	<1s	~1.5s	~5s	>8s
Network RTT	<50ms	<100ms	<200ms	>300ms
Jitter	<10ms	<20ms	<50ms	>50ms
Packet loss	<0.5%	<1%	<3%	>5%
MOS Score	4.3+	4.0+	3.5+	<3.5

Impact of threshold violations:

Violation	User Experience
Latency >300ms	Conversation feels delayed, users talk over each other
Latency >500ms	Communication becomes disjointed, unusable for real-time
Jitter >50ms	Audio becomes choppy, words cut off
Packet loss >3%	Robotic voice, missing syllables
Packet loss >10%	Unintelligible audio

Jitter Buffer Analysis

Jitter buffers smooth out packet arrival variations but add latency:

Buffer Type	Latency Added	Best For
Fixed (100ms)	100ms constant	Stable networks, low latency priority
Fixed (200ms)	200ms constant	Moderate jitter tolerance
Adaptive (100-500ms)	Variable	Variable network conditions

Jitter buffer underrun symptoms:

Audio plays in bursts with gaps
Words cut off mid-syllable
Robotic or stuttering speech

Check jitter buffer health in webrtc-internals:

Look for jitterBufferDelay and jitterBufferEmittedCount
Calculate average delay: jitterBufferDelay / jitterBufferEmittedCount * 1000 (ms)

Packet Loss Patterns

Bursty vs. random packet loss:

Pattern	Appearance	Likely Cause	Fix
Bursty	Consecutive packets lost	Network congestion, buffer overflow	Reduce bitrate, enable FEC
Random	Scattered losses	Weak signal, interference	Improve network path, use wired connection
Periodic	Regular intervals	Network equipment issue	Check routers, switches

Check packet loss direction:

inbound-rtp.packetsLost = packets lost coming TO you
remote-inbound-rtp.packetsLost = packets lost going FROM you

Asymmetric loss points to one-way network issues (different upload/download paths).

Voice Agent Pipeline Debugging

STT/LLM/TTS Latency Breakdown

Voice agent response latency accumulates across the pipeline:

┌─────────────────────────────────────────────────────────────────────────┐
│        Voice Agent Latency Breakdown (Production Reality)               │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                         │
│  User speaks    Audio      STT         LLM         TTS      Audio      │
│      │         capture   process     inference   synthesis  playback   │
│      │           │          │           │           │          │        │
│      │◄─ 70ms ──►│◄─350ms ─►│◄600-1000ms►│◄─100ms ──►│◄─ ~10ms ►│       │
│      │           │          │           │           │          │        │
│      └───────────┴──────────┴───────────┴───────────┴──────────┘        │
│                                                                         │
│      Total: ~1.2-1.6s + network hops (~10ms each × 10 hops = ~100ms)    │
│      Production metrics: P50 ~1.5s, P95 ~5s (Hamming data from 4M+ calls)│
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Component latency targets (based on Hamming production data from 4M+ calls):

Component	P50 Reality	P95 Reality	Critical Threshold
Audio capture/buffering	50-70ms	100-150ms	>200ms
STT (TTFB)	200-250ms	400-500ms	>800ms
STT (final transcript)	300-350ms	600-700ms	>1000ms
LLM (first token)	400-600ms	1500-2000ms	>3000ms
LLM (complete)	600-1000ms	2000-3000ms	>5000ms
TTS (first byte)	80-100ms	150-200ms	>400ms
TTS (complete)	100-150ms	200-300ms	>500ms
End-to-end total	~1.5s	~5s	>8s

Measuring Component-Level Latency

Track these milestones per turn:

const turnMetrics = {
  // Audio
  userSpeechStart: null,      // VAD detects speech
  userSpeechEnd: null,        // VAD detects silence (endpointing)

  // STT
  sttRequestStart: null,      // Audio sent to STT
  sttFirstPartial: null,      // First partial transcript received
  sttFinalTranscript: null,   // Final transcript received

  // LLM
  llmRequestStart: null,      // Prompt sent to LLM
  llmFirstToken: null,        // First token received
  llmComplete: null,          // Full response received

  // TTS
  ttsRequestStart: null,      // Text sent to TTS
  ttsFirstByte: null,         // First audio byte received
  ttsComplete: null,          // Full audio received

  // Playback
  audioPlaybackStart: null,   // Audio playback begins
};

// Calculate latencies
const sttLatency = turnMetrics.sttFinalTranscript - turnMetrics.userSpeechEnd;
const llmLatency = turnMetrics.llmComplete - turnMetrics.llmRequestStart;
const ttsLatency = turnMetrics.ttsFirstByte - turnMetrics.ttsRequestStart;
const turnAroundTime = turnMetrics.audioPlaybackStart - turnMetrics.userSpeechEnd;

Report P50/P95/P99 for each milestone. A single blended latency number hides variance—your P50 might be 600ms while P95 is 2000ms.

Barge-In and Interruption Handling

Barge-in = user interrupts while agent is speaking. Critical for natural conversation.

Barge-in requirements:

Metric	Production P50	Production P95	Critical
Detection latency	~200ms	~500ms	>800ms
Agent stop latency	~300ms	~700ms	>1000ms
Context retention	95%	85%	<80%
Recovery rate	>85%	>75%	<70%

Common barge-in failures:

Symptom	Cause	Fix
Agent keeps talking	VAD not detecting speech over TTS audio	Improve echo cancellation, lower VAD threshold
Agent stops for background noise	VAD false positives	Increase VAD threshold, add noise filtering
Agent stops for "mm-hmm"	Can't distinguish backchannel from interruption	Implement backchannel detection model
Agent loses context after interruption	State not preserved	Store partial response, resume gracefully

Debug barge-in:

// Log barge-in events
voiceAgent.on('bargeIn', (event) => {
  console.log({
    detectionLatency: event.detectedAt - event.userSpeechStart,
    agentStopLatency: event.agentStoppedAt - event.detectedAt,
    agentWasSpeaking: event.agentAudioPosition,
    userTranscript: event.interruptingUtterance
  });
});

Turn Detection and Endpointing Issues

Endpointing = determining when the user finished speaking.

Endpointing tradeoffs:

Setting	Pros	Cons
Short silence threshold (300ms)	Fast response	Cuts off mid-thought
Long silence threshold (800ms)	Complete utterances	Sluggish feel
Phrase endpointing	Natural sentence boundaries	Complexity, model latency

Common endpointing failures:

Symptom	Cause	Fix
Agent responds too early	Silence threshold too short	Increase to 500-700ms
Agent cuts off user	Not detecting speech continuation	Use phrase endpointing, longer threshold
Long pause before response	Silence threshold too long	Decrease to 400-500ms
Inconsistent timing	Static threshold for all contexts	Implement adaptive endpointing

Debug endpointing:

// Log endpointing decisions
speechRecognizer.on('endOfUtterance', (event) => {
  console.log({
    silenceDuration: event.silenceDurationMs,
    threshold: event.configuredThresholdMs,
    transcriptLength: event.transcript.length,
    confidence: event.confidence,
    wasInterrupted: event.interrupted
  });
});

Audio Quality Failure Modes

One-Way Audio Diagnosis

Symptom: One participant hears audio, the other hears nothing.

Diagnostic checklist:

Check webrtc-internals for packet counts
- outbound-rtp.packetsSent > 0? You're sending.
- inbound-rtp.packetsReceived > 0? You're receiving.
Zero inbound + non-zero outbound = remote side not sending or packets blocked
Non-zero inbound + zero outbound = local side not sending or packets blocked

One-way audio causes:

Pattern	Cause	Fix
A hears B, B doesn't hear A	A's outbound blocked or B's inbound blocked	Check A's firewall, NAT
A doesn't hear B, B hears A	B's outbound blocked or A's inbound blocked	Check B's firewall, NAT
Both have packets, still one-way	Codec mismatch	Verify same codec negotiated
Intermittent one-way	Network instability	Check for packet loss spikes

Echo and Feedback Issues

Echo cancellation (AEC3) is critical for voice agents—the agent's TTS audio must not trigger its own VAD.

Echo symptoms:

Symptom	Cause	Fix
User hears themselves	AEC not working	Check browser AEC settings, use headphones
Agent triggered by its own voice	TTS audio feeding back to microphone	Improve echo cancellation, mute mic during playback
Degraded audio after Chrome update	AEC3 experiment changes	Check Chrome flags, try different browser

Debug echo issues:

// Check audio processing settings
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true
  }
});

// Verify settings applied
const track = stream.getAudioTracks()[0];
console.log(track.getSettings());

Choppy or Robotic Voice

Symptom: Audio plays in bursts, sounds mechanical.

Diagnostic steps:

Check packet loss in webrtc-internals
- 5% packet loss = severe degradation
- 10% packet loss = nearly unintelligible
Check jitter buffer underruns
- Look for gaps in audio level meter
- Check jitterBufferDelay trending
Check CPU usage
- High CPU can cause frame drops
- Check totalProcessingDelay stat

Choppy audio causes and fixes:

Cause	How to Identify	Fix
High packet loss	`packetsLost` increasing	Improve network, enable FEC
High jitter	`jitter` > 50ms	Increase jitter buffer, improve network
CPU overload	High CPU in task manager	Reduce processing, disable video
Codec issues	Low bitrate, compression artifacts	Increase bitrate, use Opus

Framework-Specific Debugging

LiveKit Voice Agent Debugging

LiveKit provides a real-time framework for production-grade multimodal voice agents with WebRTC media server.

LiveKit-specific debugging tools:

Tool	Purpose	How to Access
LiveKit CLI	Room inspection, participant stats	`livekit-cli room list`, `livekit-cli room inspect`
Room Composite	Debug recordings	Enable egress for room recordings
Webhook events	Connection lifecycle	Configure webhook endpoint
Agent logs	Pipeline debugging	`LIVEKIT_LOG_LEVEL=debug`

Debug LiveKit agent pipeline:

# Enable verbose logging in LiveKit agent
import logging
logging.basicConfig(level=logging.DEBUG)

from livekit.agents import JobContext, WorkerOptions, cli

async def entrypoint(ctx: JobContext):
    # Log connection state
    ctx.room.on("connection_state_changed", lambda state:
        print(f"Connection state: {state}"))

    # Log participant events
    ctx.room.on("participant_connected", lambda p:
        print(f"Participant connected: {p.identity}"))

    # Log track subscriptions
    ctx.room.on("track_subscribed", lambda track, publication, participant:
        print(f"Track subscribed: {track.kind} from {participant.identity}"))

LiveKit connection issues:

Symptom	Cause	Fix
Agent doesn't connect	Room token invalid	Check token expiry, room name
Audio not received	Track not subscribed	Verify auto-subscribe or manual subscription
High latency	Server region	Deploy agent in same region as server
Connection drops	Network instability	Implement reconnection logic

Hamming integration for LiveKit: Hamming offers LiveKit-to-LiveKit WebRTC testing: auto-provisioned rooms, scenario generation from prompts, 50+ quality metrics evaluated in <10 minutes.

Pipecat Pipeline Troubleshooting

Pipecat specializes in real-time voice agent infrastructure with STT/LLM/TTS orchestration.

Pipecat debugging tools:

Tool	Purpose	Usage
Whisker	Real-time pipeline debugger	Visualizes frame flow through pipeline
Tail	Terminal metrics dashboard	Monitors latency, token usage in real-time

Common Pipecat issues:

Symptom	Cause	Fix
2-5s response delay	STT endpointing timeout	Adjust `vad_parameters.min_silence_duration_ms`
Delayed response	LLM queuing	Check LLM rate limits, implement streaming
Audio cuts out	Frame processor error	Check pipeline error handlers
Memory growth	Frame accumulation	Implement proper frame lifecycle

Debug Pipecat pipeline latency:

from pipecat.pipeline import Pipeline
from pipecat.services.deepgram import DeepgramSTT
from pipecat.services.openai import OpenAILLMService
from pipecat.services.cartesia import CartesiaTTS
import time

class LatencyLogger:
    def __init__(self):
        self.timestamps = {}

    async def log_stt_output(self, frame):
        self.timestamps['stt_complete'] = time.time()
        print(f"STT latency: {self.timestamps['stt_complete'] - self.timestamps.get('speech_end', 0):.3f}s")

    async def log_llm_output(self, frame):
        self.timestamps['llm_first_token'] = time.time()
        print(f"LLM TTFT: {self.timestamps['llm_first_token'] - self.timestamps.get('stt_complete', 0):.3f}s")

    async def log_tts_output(self, frame):
        self.timestamps['tts_first_byte'] = time.time()
        print(f"TTS TTFB: {self.timestamps['tts_first_byte'] - self.timestamps.get('llm_complete', 0):.3f}s")

Pipecat VAD tuning for 2-5s delay issue:

from pipecat.vad.silero import SileroVADAnalyzer

vad = SileroVADAnalyzer(
    min_silence_duration_ms=400,  # Reduce from default 700ms
    speech_pad_ms=100,
    threshold=0.5  # Adjust sensitivity
)

Framework-Agnostic Diagnostic Patterns

Universal debugging approaches for any voice agent framework:

1. Component boundary logging:

// Log at every boundary
const logBoundary = (component, direction, data) => {
  console.log({
    timestamp: Date.now(),
    component,
    direction, // 'in' or 'out'
    dataSize: JSON.stringify(data).length,
    requestId: currentRequestId
  });
};

// Audio → STT
logBoundary('stt', 'in', { audioChunkSize, format });
// STT → LLM
logBoundary('stt', 'out', { transcript, confidence });
logBoundary('llm', 'in', { promptTokens });
// LLM → TTS
logBoundary('llm', 'out', { responseTokens, content });
logBoundary('tts', 'in', { textLength });
// TTS → Audio
logBoundary('tts', 'out', { audioBytes, duration });

2. Request ID correlation:

// Generate at call start, propagate everywhere
const requestId = `call_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;

// Include in all API calls
sttClient.transcribe(audio, { metadata: { requestId } });
llmClient.complete(prompt, { metadata: { requestId } });
ttsClient.synthesize(text, { metadata: { requestId } });

3. Session replay capability: Store full session data for post-mortem analysis:

Audio recordings (both directions)
Full transcripts with timestamps
LLM prompts and responses
All component latencies

Structured Logging and Observability

Essential Voice Agent Metrics

Network layer:

Metric	What It Measures	How to Collect
ICE connection state	Connection establishment	`pc.iceConnectionState` events
Selected candidate pair	Connection type (direct/relay)	`pc.getStats()` candidate-pair
RTT	Network latency	`remote-inbound-rtp.roundTripTime`
Packet loss %	Network reliability	`(packetsLost / (packetsReceived + packetsLost)) * 100`
Jitter	Packet timing variance	`inbound-rtp.jitter * 1000` (ms)

Media layer:

Metric	What It Measures	How to Collect
Audio level	Volume, silence detection	`audio-level` stat or Web Audio API
Bitrate	Audio quality	`Δ bytesSent / Δ time`
Codec	Negotiated codec	SDP or `getStats()`
Frame drops	Processing issues	Custom counter on frame processor

Pipeline layer:

Metric	What It Measures	How to Collect
STT TTFB	First partial latency	`firstPartialTime - audioEndTime`
STT RTF	Processing speed	`processingTime / audioDuration`
LLM TTFT	First token latency	`firstTokenTime - requestTime`
LLM tokens/sec	Generation speed	`totalTokens / generationTime`
TTS TTFB	First audio byte latency	`firstByteTime - requestTime`
Turn-around time	End-to-end response	`agentAudioStart - userSpeechEnd`

Minimal Viable Logging Schema

Session-level:

{
  "session_id": "sess_abc123",
  "start_time": "2026-01-25T10:30:00Z",
  "end_time": "2026-01-25T10:35:00Z",
  "duration_seconds": 300,
  "participant_count": 2,
  "completion_status": "completed",
  "ice_connection_type": "relay",
  "average_rtt_ms": 45,
  "total_packet_loss_percent": 0.3
}

Turn-level:

{
  "session_id": "sess_abc123",
  "turn_index": 5,
  "user_speech_start": "2026-01-25T10:31:15.000Z",
  "user_speech_end": "2026-01-25T10:31:18.500Z",
  "user_transcript": "I need to reschedule my appointment",
  "asr_confidence": 0.94,
  "stt_latency_ms": 180,
  "llm_first_token_ms": 220,
  "llm_complete_ms": 450,
  "tts_first_byte_ms": 85,
  "turn_around_time_ms": 735,
  "agent_response": "I can help you reschedule...",
  "barge_in_occurred": false,
  "tool_calls": [
    {"name": "get_appointments", "success": true, "latency_ms": 45}
  ]
}

Connection-level (sample every 5-10 seconds):

{
  "session_id": "sess_abc123",
  "timestamp": "2026-01-25T10:31:20.000Z",
  "ice_connection_state": "connected",
  "selected_candidate_type": "relay",
  "rtt_ms": 42,
  "jitter_ms": 8,
  "packets_received": 15420,
  "packets_lost": 12,
  "packet_loss_percent": 0.08,
  "audio_level_db": -25.5
}

Trace Correlation Across Components

Implement distributed tracing with OpenTelemetry:

const { trace, context, propagation } = require('@opentelemetry/api');

// Start trace at call initiation
const tracer = trace.getTracer('voice-agent');
const span = tracer.startSpan('voice_agent_call');
const ctx = trace.setSpan(context.active(), span);

// Propagate context to all services
const headers = {};
propagation.inject(ctx, headers);

// STT call
await sttClient.transcribe(audio, { headers });

// LLM call
await llmClient.complete(prompt, { headers });

// TTS call
await ttsClient.synthesize(text, { headers });

span.end();

This enables: "Which component caused the 2s latency spike in session X?"

Symptom-to-Cause Diagnostic Tables

Network Issues Quick Reference

Symptom	Likely Cause	Diagnostic Steps	Fix
ICE state stuck "checking"	Firewall blocking UDP	Check STUN Binding responses in Wireshark	Try TURN TCP fallback on port 443
No audio either direction	Media connection failed	Verify ICE candidate exchange in webrtc-internals	Ensure TURN server configured
One-way audio	Asymmetric NAT/firewall	Check inbound/outbound packet counts	Open UDP ports, use TURN relay
TURN Allocation failures	Invalid/expired credentials	Check for 401/403 errors	Refresh TURN credentials
High RTT (>300ms)	Network congestion or routing	Compare RTT across connection types	Use closer server, improve network path
Intermittent disconnects	Network instability	Check ICE restart events	Implement automatic ICE restart

Audio Quality Issues Quick Reference

Symptom	Likely Cause	Diagnostic Steps	Fix
Choppy/robotic voice	Packet loss >5% or jitter buffer underruns	Check `packetsLost`, `jitter` in webrtc-internals	Improve network, increase jitter buffer
Echo or feedback	AEC3 failure or device issue	Test different browser/device	Use headphones, check Chrome flags
Audio cuts out intermittently	Network instability or device overload	Monitor packet loss patterns, CPU usage	Reduce processing, improve network
Degraded audio quality	Codec bitrate too low	Check selected codec, bitrate stats	Increase bitrate, use Opus codec
Latency >500ms	Combined network + jitter buffer + processing	Break down RTT, jitter buffer, STT/LLM/TTS	Optimize each component

Voice Pipeline Issues Quick Reference

Symptom	Likely Cause	Diagnostic Steps	Fix
2-5s delay before agent responds	STT endpointing timeout or LLM queuing	Measure STT final transcript latency, LLM TTFT	Reduce silence threshold, check LLM rate limits
Agent cuts off user mid-sentence	Aggressive endpointing or low VAD threshold	Check silence threshold config	Increase to 500-700ms, use phrase endpointing
Agent doesn't stop when interrupted	Barge-in detection disabled or latency >200ms	Check VAD processing time, AEC	Improve echo cancellation, tune VAD
Frequent false interruptions	Poor echo cancellation or VAD false positives	Check TTS audio levels, VAD triggers	Improve AEC, increase VAD threshold
High P95 latency spikes	Service queuing or cold starts	Monitor per-component P95 latencies	Implement service warm-up, scale capacity

Tooling Ecosystem Overview

Open Source Debugging Tools

Tool	Purpose	Best For
chrome://webrtc-internals	WebRTC session inspection	Development debugging, connection issues
about:webrtc (Firefox)	Firefox WebRTC debugging	Firefox-specific issues
Wireshark	Network packet capture	STUN/TURN/RTP protocol analysis
Whisker (Pipecat)	Pipeline frame visualization	Pipecat frame flow debugging
Tail (Pipecat)	Terminal metrics dashboard	Real-time Pipecat metrics
LiveKit CLI	Room and participant inspection	LiveKit deployment debugging

Network Analysis Commands

Test STUN connectivity:

# Using turnutils
turnutils_stunclient stun.l.google.com

Test TURN connectivity:

# Using turnutils
turnutils_uclient -u username -w password turn.server.com

Capture WebRTC traffic with Wireshark:

# Filter for STUN protocol
wireshark -f "udp port 3478 or udp port 5349"

# Filter for RTP
wireshark -f "udp portrange 10000-60000"

Commercial Testing Platforms

Platform	Capabilities
Hamming	LiveKit integration, auto-generated test scenarios, 50+ quality metrics, session replay, drift detection

Testing and Validation Strategies

Automated Testing Checklist

Before deploying voice agent changes:

Unit tests: STT/LLM/TTS component mocks
Integration tests: Full pipeline with real services
Regression tests: Golden call set (50+ recordings)
Load tests: Concurrent call capacity
Network simulation: Packet loss, jitter injection

Synthetic Test Call Patterns

// Test scenarios to cover
const testScenarios = [
  {
    name: 'clean_audio',
    backgroundNoise: null,
    packetLoss: 0,
    jitter: 0
  },
  {
    name: 'noisy_environment',
    backgroundNoise: 'coffee_shop_-20db',
    packetLoss: 0,
    jitter: 0
  },
  {
    name: 'poor_network',
    backgroundNoise: null,
    packetLoss: 3,  // 3%
    jitter: 30      // 30ms
  },
  {
    name: 'barge_in_test',
    interruptAt: 1500,  // ms into agent response
    expectedStopLatency: 200  // ms
  }
];

Regression Detection Alerts

Set alerts for statistical deviations:

Metric	Alert Threshold	Window
Turn-around time P95	+20% from baseline	1 hour
Barge-in accuracy	-5% from baseline	1 hour
Task completion rate	-10% from baseline	4 hours
Packet loss	>2% sustained	15 minutes
STT WER	+10% from baseline	1 hour

Conclusion: Debugging Workflow

Prioritized Debugging Steps

When a voice agent has issues, follow this order:

Network layer first
- Open chrome://webrtc-internals
- Verify ICE reaches "connected" state
- Check selected candidate pair type
- If ICE fails, stop here—fix network first
Media layer second
- Check RTP packet loss (<1% target)
- Check jitter (<20ms target)
- Check RTT (<150ms one-way target)
- If degraded, fix network or adjust jitter buffer
Pipeline layer third
- Measure STT latency (TTFB + final)
- Measure LLM latency (TTFT + complete)
- Measure TTS latency (TTFB)
- Identify which component is slowest
Conversation layer fourth
- Check barge-in detection and response
- Check endpointing configuration
- Check turn-taking behavior

Building Observability from Day One

Design for debuggability:

Implement structured logging at component boundaries
Generate and propagate request IDs through entire pipeline
Store session recordings with full transcripts
Collect RTP statistics via getStats() API
Track latency percentiles (P50/P95/P99), not just averages
Implement health checks at each pipeline stage
Set up alerts for threshold violations

Next Steps

For LiveKit users: Monitor Pipecat Agents in Production
For production incidents: Voice Agent Incident Response Runbook
For comprehensive testing: How to Evaluate and Test Voice Agents
For metrics deep-dive: Voice Agent Evaluation Metrics Guide

How Hamming Helps with WebRTC Voice Agent Debugging

Hamming provides specialized tooling for debugging and testing WebRTC voice agents:

LiveKit-to-LiveKit Testing: Auto-provisioned rooms, synthetic test calls, real WebRTC connections
50+ Quality Metrics: Latency breakdown, barge-in accuracy, task completion, audio quality
Session Replay: Full audio playback with transcripts and component traces
Regression Detection: Automated alerts when metrics deviate from baseline
Scenario Generation: Auto-generate test cases from prompts, execute in <10 minutes

Instead of manually debugging with chrome://webrtc-internals, get automated visibility into every layer of your voice agent stack.

Debug your voice agents with Hamming →

Frequently Asked Questions

Why does my WebRTC voice agent work locally but fail in production?

What's the quickest way to diagnose WebRTC connection failures?

How much latency is actually acceptable for voice agents in production?

Why do I get one-way audio in WebRTC voice calls?

How can I debug packet loss and choppy audio in WebRTC?

What causes ICE gathering to hang in 'gathering' state?

How do I implement reliable barge-in for voice agents?

Why does my voice agent have high latency with LiveKit or Daily?

What's the difference between srflx, relay, and host candidates in ICE?

How can I test WebRTC voice agents at scale without manual testing?

Sumanyu Sharma

Related Resources

Testing LiveKit Voice Agents: Unit, Scenario, Load & Production Guide (2026)

Voice Agent Troubleshooting: Complete Diagnostic Checklist

Voice Agent Incident Response Runbook: Debug and Fix Failures in Production