Debug WebRTC Voice Agents: Complete Checklist & Troubleshooting Guide

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

January 25, 202621 min read
Debug WebRTC Voice Agents: Complete Checklist & Troubleshooting Guide

TL;DR: Debug WebRTC voice agents using this 3-layer diagnostic approach:

LayerWhat to CheckKey ToolsProduction Thresholds (Hamming data)
NetworkICE connection, STUN/TURN, firewallchrome://webrtc-internals, WiresharkICE state: "connected", RTT <150ms
MediaRTP packet loss, jitter, audio qualitywebrtc-internals stats, getStats APIPacket loss <1%, jitter <20ms
PipelineSTT/LLM/TTS latency, barge-in, turn detectionComponent traces, Whisker/TailEnd-to-end P50 ~1.5s, P95 ~5s, barge-in <500ms

Start at the Network layer. If ICE never reaches "connected", nothing else matters. If RTP stats show packet loss >5%, audio quality degrades before it reaches your AI pipeline. Only debug STT/LLM/TTS after verifying network and media are healthy.

Related Guides:


Quick Symptom Lookup

Jump to the section that matches your issue:

SymptomLikely CauseGo To Section
ICE state stuck on "checking"Firewall blocking UDPICE Connection Failures
No audio either directionMedia connection failedSTUN/TURN Configuration
One-way audioAsymmetric NAT/firewallOne-Way Audio Diagnosis
Choppy/robotic voicePacket loss or jitterRTP Media Quality
2-5 second response delaySTT endpointing or LLM queuingPipeline Latency Breakdown
Agent doesn't stop when interruptedBarge-in detection issueBarge-In Handling
Agent cuts off user mid-sentenceAggressive endpointingTurn Detection Issues

WebRTC Debugging Fundamentals

Voice agents built on WebRTC require debugging across multiple layers that traditional application monitoring doesn't cover: network traversal (ICE/STUN/TURN), media transport (RTP/jitter/packet loss), and the AI pipeline (STT/LLM/TTS latency). Engineers face limited visibility compared to traditional systems—no standard observability for interruption patterns, ASR drift, or orchestration delays.

This guide provides symptom-to-cause diagnostics, structured logging patterns, and framework-specific debugging for LiveKit and Pipecat implementations.

Browser-Based Debugging with chrome://webrtc-internals

chrome://webrtc-internals is the most comprehensive tool for debugging WebRTC connections in real-time.

How to use it:

  1. Open chrome://webrtc-internals in Chrome (v87+) before starting your voice agent session
  2. Start the voice agent call—connection data will populate automatically
  3. Look for the peer connection entry and expand it to see:
    • ICE candidate pairs and connection state
    • Inbound/outbound RTP statistics (packet loss, jitter, RTT)
    • Audio/video track statistics

Critical: Open webrtc-internals before the call starts. Connection establishment data (ICE gathering, candidate exchange) is only captured if the tab is open before the session begins.

Key sections to check:

SectionWhat It ShowsWhat to Look For
ICE Candidate PairsConnection attemptsSelected pair should show "succeeded"
Inbound RTP (audio)Incoming audio statspacketsLost, jitter, roundTripTime
Outbound RTP (audio)Outgoing audio statspacketsSent, bytesSent
Connection StateICE state transitionsShould reach "connected" or "completed"

Firefox alternative: Use about:webrtc for similar functionality, though with fewer features than Chrome.

Production monitoring: chrome://webrtc-internals is impractical outside development. For production, implement client-side event tracing using the getStats() API to collect RTP statistics programmatically.

Understanding WebRTC Connection Architecture

WebRTC uses the Interactive Connectivity Establishment (ICE) framework to establish peer connections through firewalls:

┌─────────────────────────────────────────────────────────────────┐
                    ICE Connection Flow                          
├─────────────────────────────────────────────────────────────────┤
                                                                 
  Client A                                        Client B       
                                                               
       1. Gather ICE candidates                                
          (host, srflx, relay)                                 
                                                               
       2. Exchange candidates via signaling                    
      ─────────────────────────────────────────────>           
     <─────────────────────────────────────────────│           
                                                               
       3. Connectivity checks (STUN binding)                   
     <─────────────────────────────────────────────>           
                                                               
       4. Select best candidate pair                           
     │═══════════════════════════════════════════════│           
              Media flows directly or via TURN                 
                                                                 
└─────────────────────────────────────────────────────────────────┘

ICE candidate types:

Candidate TypeHow It's ObtainedWhen It Works
hostLocal network interfaceSame network only
srflx (server reflexive)STUN server response~80% of NAT configurations
relayTURN server allocationAlways works (adds latency)

STUN (Session Traversal Utilities for NAT): Discovers your public IP address and NAT type. Works for most home/office NATs but fails with Symmetric NAT.

TURN (Traversal Using Relays around NAT): Relays all media through the server. Required when Symmetric NAT blocks direct connections (~20% of cases). Adds 20-50ms latency overhead.

SDP Negotiation and Signaling

Session Description Protocol (SDP) describes the multimedia session: codecs, formats, encryption parameters.

Offer/Answer flow:

  1. Caller creates SDP offer with supported codecs
  2. Offer sent via signaling channel (WebSocket, HTTP)
  3. Callee creates SDP answer, accepting/rejecting codecs
  4. Answer sent back via signaling
  5. Both sides set local/remote descriptions
  6. ICE gathering begins after local description is set

Common SDP negotiation failures:

ErrorCauseFix
InvalidStateErrorSetting description in wrong stateCheck signalingState before calling setLocalDescription/setRemoteDescription
Codec mismatchAnswerer doesn't support offered codecsEnsure both sides support at least one common codec (Opus recommended for audio)
ICE gathering never startsLocal description not setCall setLocalDescription() with the offer/answer

Signaling state transitions:

stable  have-local-offer  have-remote-answer  stable (caller)
stable  have-remote-offer  have-local-answer  stable (callee)

Monitor signalingState changes—unexpected states indicate negotiation problems.


Network and Connectivity Troubleshooting

ICE Connection Failure Diagnosis

Symptom: ICE connection state stuck on "checking" or transitions to "failed"

Diagnostic steps:

  1. Check ICE candidate gathering in webrtc-internals:

    • Look for iceGatheringState → should reach "complete"
    • Check iceCandidates array for all three types (host, srflx, relay)
    • Missing srflx candidates = STUN server unreachable
    • Missing relay candidates = TURN server unreachable or bad credentials
  2. Check ICE connection state transitions:

    new  checking  connected  completed   Success
    new  checking  failed                   Failure
  3. Examine candidate pairs:

    • Look for pairs with state "succeeded"
    • If all pairs show "failed", connectivity is blocked

ICE failure causes and fixes:

ICE StateLikely CauseDiagnosticFix
Stuck on "gathering"STUN/TURN unreachableCheck network connectivity to STUN/TURN serversVerify server URLs, check firewall
Stuck on "checking"All candidates blockedCheck if any candidate pair attemptsOpen UDP ports 3478, 5349, 10000-60000
Transitions to "failed"No successful connectivity checkCheck for STUN binding failuresUse TURN as fallback, check credentials
Reaches "connected" then "failed"Connection droppedCheck network stabilityImplement ICE restart on failure

STUN/TURN Configuration Verification

Verify STUN server connectivity:

// Test STUN binding request
const pc = new RTCPeerConnection({
  iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});

pc.onicecandidate = (e) => {
  if (e.candidate) {
    console.log('Candidate type:', e.candidate.type);
    // 'srflx' = STUN working
    // 'host' only = STUN not working
  }
};

pc.createDataChannel('test');
pc.createOffer().then(offer => pc.setLocalDescription(offer));

Verify TURN server connectivity:

const pc = new RTCPeerConnection({
  iceServers: [{
    urls: 'turn:your-turn-server.com:3478',
    username: 'your-username',
    credential: 'your-credential'
  }],
  iceTransportPolicy: 'relay' // Force TURN only
});

pc.onicecandidate = (e) => {
  if (e.candidate && e.candidate.type === 'relay') {
    console.log('TURN working: relay candidate gathered');
  }
};

TURN allocation failure causes:

ErrorCauseFix
401 UnauthorizedInvalid credentialsCheck username/credential, verify not expired
403 ForbiddenCredential expiredRefresh TURN credentials (typically expire in 24h)
Connection timeoutServer unreachableCheck firewall, try TCP fallback (port 443)
No relay candidatesTURN disabled or misconfiguredVerify iceServers configuration

Wireshark debugging:

# Capture STUN/TURN traffic
wireshark -f "udp port 3478 or udp port 5349 or tcp port 443"

Look for:

  • STUN Binding Request → Binding Response (success)
  • TURN Allocate Request → Allocate Response (success)
  • TURN CreatePermission → CreatePermission Response

Firewall and NAT Traversal Problems

Symptom: Connection works on some networks but fails on others

Corporate firewall diagnosis:

BehaviorLikely BlockWorkaround
No srflx candidatesUDP 3478 blockedUse TURN over TCP/443
No relay candidatesTURN ports blockedUse TURN over TCP/443 (TLS)
ICE fails after gatheringOutbound UDP blockedConfigure TURN with transport=tcp

NAT type diagnosis:

NAT TypeDirect ConnectionSTUN WorksTURN Required
Full ConeYesYesNo
Restricted ConeSometimesYesSometimes
Port RestrictedSometimesYesSometimes
SymmetricNoNoYes

Symmetric NAT detection: If srflx candidates gather but ICE connectivity checks fail, suspect Symmetric NAT. Force TURN relay:

const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:your-server.com:443?transport=tcp', username: '...', credential: '...' }
  ]
});

RTP Media Quality Analysis

Understanding RTP Statistics

Once ICE connection succeeds, audio flows via RTP. Use getStats() API to monitor quality:

// Collect RTP statistics every 2 seconds
setInterval(async () => {
  const stats = await peerConnection.getStats();
  stats.forEach(report => {
    if (report.type === 'inbound-rtp' && report.kind === 'audio') {
      console.log('Packets received:', report.packetsReceived);
      console.log('Packets lost:', report.packetsLost);
      console.log('Jitter (ms):', report.jitter * 1000);
    }
    if (report.type === 'remote-inbound-rtp' && report.kind === 'audio') {
      console.log('Round trip time (ms):', report.roundTripTime * 1000);
    }
  });
}, 2000);

Key RTP metrics from webrtc-internals:

MetricLocation in webrtc-internalsCalculation
Packet LosspacketsLost / (packetsReceived + packetsLost)Percentage of lost packets
Jitterjitter field (in seconds)Variation in packet arrival times
RTTroundTripTime or currentRoundTripTimeNetwork latency (round trip)
BitrateΔbytesSent / ΔtimeThroughput in kbps

Voice Quality Thresholds

Production thresholds for voice agents (based on Hamming's analysis of 1M+ calls):

MetricExcellentGood (P50)Acceptable (P95)Poor
End-to-end latency<1s~1.5s~5s>8s
Network RTT<50ms<100ms<200ms>300ms
Jitter<10ms<20ms<50ms>50ms
Packet loss<0.5%<1%<3%>5%
MOS Score4.3+4.0+3.5+<3.5

Impact of threshold violations:

ViolationUser Experience
Latency >300msConversation feels delayed, users talk over each other
Latency >500msCommunication becomes disjointed, unusable for real-time
Jitter >50msAudio becomes choppy, words cut off
Packet loss >3%Robotic voice, missing syllables
Packet loss >10%Unintelligible audio

Jitter Buffer Analysis

Jitter buffers smooth out packet arrival variations but add latency:

Buffer TypeLatency AddedBest For
Fixed (100ms)100ms constantStable networks, low latency priority
Fixed (200ms)200ms constantModerate jitter tolerance
Adaptive (100-500ms)VariableVariable network conditions

Jitter buffer underrun symptoms:

  • Audio plays in bursts with gaps
  • Words cut off mid-syllable
  • Robotic or stuttering speech

Check jitter buffer health in webrtc-internals:

  • Look for jitterBufferDelay and jitterBufferEmittedCount
  • Calculate average delay: jitterBufferDelay / jitterBufferEmittedCount * 1000 (ms)

Packet Loss Patterns

Bursty vs. random packet loss:

PatternAppearanceLikely CauseFix
BurstyConsecutive packets lostNetwork congestion, buffer overflowReduce bitrate, enable FEC
RandomScattered lossesWeak signal, interferenceImprove network path, use wired connection
PeriodicRegular intervalsNetwork equipment issueCheck routers, switches

Check packet loss direction:

  • inbound-rtp.packetsLost = packets lost coming TO you
  • remote-inbound-rtp.packetsLost = packets lost going FROM you

Asymmetric loss points to one-way network issues (different upload/download paths).


Voice Agent Pipeline Debugging

STT/LLM/TTS Latency Breakdown

Voice agent response latency accumulates across the pipeline:

┌─────────────────────────────────────────────────────────────────────────┐
        Voice Agent Latency Breakdown (Production Reality)               
├─────────────────────────────────────────────────────────────────────────┤
                                                                         
  User speaks    Audio      STT         LLM         TTS      Audio      
               capture   process     inference   synthesis  playback   
                                                                   
      │◄─ 70ms ──►│◄─350ms ─►│◄600-1000ms►│◄─100ms ──►│◄─ ~10ms ►│       
                                                                   
      └───────────┴──────────┴───────────┴───────────┴──────────┘        
                                                                         
      Total: ~1.2-1.6s + network hops (~10ms each × 10 hops = ~100ms)    
      Production metrics: P50 ~1.5s, P95 ~5s (Hamming data from 1M+ calls)
                                                                         
└─────────────────────────────────────────────────────────────────────────┘

Component latency targets (based on Hamming production data from 1M+ calls):

ComponentP50 RealityP95 RealityCritical Threshold
Audio capture/buffering50-70ms100-150ms>200ms
STT (TTFB)200-250ms400-500ms>800ms
STT (final transcript)300-350ms600-700ms>1000ms
LLM (first token)400-600ms1500-2000ms>3000ms
LLM (complete)600-1000ms2000-3000ms>5000ms
TTS (first byte)80-100ms150-200ms>400ms
TTS (complete)100-150ms200-300ms>500ms
End-to-end total~1.5s~5s>8s

Measuring Component-Level Latency

Track these milestones per turn:

const turnMetrics = {
  // Audio
  userSpeechStart: null,      // VAD detects speech
  userSpeechEnd: null,        // VAD detects silence (endpointing)

  // STT
  sttRequestStart: null,      // Audio sent to STT
  sttFirstPartial: null,      // First partial transcript received
  sttFinalTranscript: null,   // Final transcript received

  // LLM
  llmRequestStart: null,      // Prompt sent to LLM
  llmFirstToken: null,        // First token received
  llmComplete: null,          // Full response received

  // TTS
  ttsRequestStart: null,      // Text sent to TTS
  ttsFirstByte: null,         // First audio byte received
  ttsComplete: null,          // Full audio received

  // Playback
  audioPlaybackStart: null,   // Audio playback begins
};

// Calculate latencies
const sttLatency = turnMetrics.sttFinalTranscript - turnMetrics.userSpeechEnd;
const llmLatency = turnMetrics.llmComplete - turnMetrics.llmRequestStart;
const ttsLatency = turnMetrics.ttsFirstByte - turnMetrics.ttsRequestStart;
const turnAroundTime = turnMetrics.audioPlaybackStart - turnMetrics.userSpeechEnd;

Report P50/P95/P99 for each milestone. A single blended latency number hides variance—your P50 might be 600ms while P95 is 2000ms.

Barge-In and Interruption Handling

Barge-in = user interrupts while agent is speaking. Critical for natural conversation.

Barge-in requirements:

MetricProduction P50Production P95Critical
Detection latency~200ms~500ms>800ms
Agent stop latency~300ms~700ms>1000ms
Context retention95%85%<80%
Recovery rate>85%>75%<70%

Common barge-in failures:

SymptomCauseFix
Agent keeps talkingVAD not detecting speech over TTS audioImprove echo cancellation, lower VAD threshold
Agent stops for background noiseVAD false positivesIncrease VAD threshold, add noise filtering
Agent stops for "mm-hmm"Can't distinguish backchannel from interruptionImplement backchannel detection model
Agent loses context after interruptionState not preservedStore partial response, resume gracefully

Debug barge-in:

// Log barge-in events
voiceAgent.on('bargeIn', (event) => {
  console.log({
    detectionLatency: event.detectedAt - event.userSpeechStart,
    agentStopLatency: event.agentStoppedAt - event.detectedAt,
    agentWasSpeaking: event.agentAudioPosition,
    userTranscript: event.interruptingUtterance
  });
});

Turn Detection and Endpointing Issues

Endpointing = determining when the user finished speaking.

Endpointing tradeoffs:

SettingProsCons
Short silence threshold (300ms)Fast responseCuts off mid-thought
Long silence threshold (800ms)Complete utterancesSluggish feel
Phrase endpointingNatural sentence boundariesComplexity, model latency

Common endpointing failures:

SymptomCauseFix
Agent responds too earlySilence threshold too shortIncrease to 500-700ms
Agent cuts off userNot detecting speech continuationUse phrase endpointing, longer threshold
Long pause before responseSilence threshold too longDecrease to 400-500ms
Inconsistent timingStatic threshold for all contextsImplement adaptive endpointing

Debug endpointing:

// Log endpointing decisions
speechRecognizer.on('endOfUtterance', (event) => {
  console.log({
    silenceDuration: event.silenceDurationMs,
    threshold: event.configuredThresholdMs,
    transcriptLength: event.transcript.length,
    confidence: event.confidence,
    wasInterrupted: event.interrupted
  });
});

Audio Quality Failure Modes

One-Way Audio Diagnosis

Symptom: One participant hears audio, the other hears nothing.

Diagnostic checklist:

  • Check webrtc-internals for packet counts
    • outbound-rtp.packetsSent > 0? You're sending.
    • inbound-rtp.packetsReceived > 0? You're receiving.
  • Zero inbound + non-zero outbound = remote side not sending or packets blocked
  • Non-zero inbound + zero outbound = local side not sending or packets blocked

One-way audio causes:

PatternCauseFix
A hears B, B doesn't hear AA's outbound blocked or B's inbound blockedCheck A's firewall, NAT
A doesn't hear B, B hears AB's outbound blocked or A's inbound blockedCheck B's firewall, NAT
Both have packets, still one-wayCodec mismatchVerify same codec negotiated
Intermittent one-wayNetwork instabilityCheck for packet loss spikes

Echo and Feedback Issues

Echo cancellation (AEC3) is critical for voice agents—the agent's TTS audio must not trigger its own VAD.

Echo symptoms:

SymptomCauseFix
User hears themselvesAEC not workingCheck browser AEC settings, use headphones
Agent triggered by its own voiceTTS audio feeding back to microphoneImprove echo cancellation, mute mic during playback
Degraded audio after Chrome updateAEC3 experiment changesCheck Chrome flags, try different browser

Debug echo issues:

// Check audio processing settings
const stream = await navigator.mediaDevices.getUserMedia({
  audio: {
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true
  }
});

// Verify settings applied
const track = stream.getAudioTracks()[0];
console.log(track.getSettings());

Choppy or Robotic Voice

Symptom: Audio plays in bursts, sounds mechanical.

Diagnostic steps:

  1. Check packet loss in webrtc-internals

    • 5% packet loss = severe degradation

    • 10% packet loss = nearly unintelligible

  2. Check jitter buffer underruns

    • Look for gaps in audio level meter
    • Check jitterBufferDelay trending
  3. Check CPU usage

    • High CPU can cause frame drops
    • Check totalProcessingDelay stat

Choppy audio causes and fixes:

CauseHow to IdentifyFix
High packet losspacketsLost increasingImprove network, enable FEC
High jitterjitter > 50msIncrease jitter buffer, improve network
CPU overloadHigh CPU in task managerReduce processing, disable video
Codec issuesLow bitrate, compression artifactsIncrease bitrate, use Opus

Framework-Specific Debugging

LiveKit Voice Agent Debugging

LiveKit provides a real-time framework for production-grade multimodal voice agents with WebRTC media server.

LiveKit-specific debugging tools:

ToolPurposeHow to Access
LiveKit CLIRoom inspection, participant statslivekit-cli room list, livekit-cli room inspect
Room CompositeDebug recordingsEnable egress for room recordings
Webhook eventsConnection lifecycleConfigure webhook endpoint
Agent logsPipeline debuggingLIVEKIT_LOG_LEVEL=debug

Debug LiveKit agent pipeline:

# Enable verbose logging in LiveKit agent
import logging
logging.basicConfig(level=logging.DEBUG)

from livekit.agents import JobContext, WorkerOptions, cli

async def entrypoint(ctx: JobContext):
    # Log connection state
    ctx.room.on("connection_state_changed", lambda state:
        print(f"Connection state: {state}"))

    # Log participant events
    ctx.room.on("participant_connected", lambda p:
        print(f"Participant connected: {p.identity}"))

    # Log track subscriptions
    ctx.room.on("track_subscribed", lambda track, publication, participant:
        print(f"Track subscribed: {track.kind} from {participant.identity}"))

LiveKit connection issues:

SymptomCauseFix
Agent doesn't connectRoom token invalidCheck token expiry, room name
Audio not receivedTrack not subscribedVerify auto-subscribe or manual subscription
High latencyServer regionDeploy agent in same region as server
Connection dropsNetwork instabilityImplement reconnection logic

Hamming integration for LiveKit: Hamming offers LiveKit-to-LiveKit WebRTC testing: auto-provisioned rooms, scenario generation from prompts, 50+ quality metrics evaluated in <10 minutes.

Pipecat Pipeline Troubleshooting

Pipecat specializes in real-time voice agent infrastructure with STT/LLM/TTS orchestration.

Pipecat debugging tools:

ToolPurposeUsage
WhiskerReal-time pipeline debuggerVisualizes frame flow through pipeline
TailTerminal metrics dashboardMonitors latency, token usage in real-time

Common Pipecat issues:

SymptomCauseFix
2-5s response delaySTT endpointing timeoutAdjust vad_parameters.min_silence_duration_ms
Delayed responseLLM queuingCheck LLM rate limits, implement streaming
Audio cuts outFrame processor errorCheck pipeline error handlers
Memory growthFrame accumulationImplement proper frame lifecycle

Debug Pipecat pipeline latency:

from pipecat.pipeline import Pipeline
from pipecat.services.deepgram import DeepgramSTT
from pipecat.services.openai import OpenAILLMService
from pipecat.services.cartesia import CartesiaTTS
import time

class LatencyLogger:
    def __init__(self):
        self.timestamps = {}

    async def log_stt_output(self, frame):
        self.timestamps['stt_complete'] = time.time()
        print(f"STT latency: {self.timestamps['stt_complete'] - self.timestamps.get('speech_end', 0):.3f}s")

    async def log_llm_output(self, frame):
        self.timestamps['llm_first_token'] = time.time()
        print(f"LLM TTFT: {self.timestamps['llm_first_token'] - self.timestamps.get('stt_complete', 0):.3f}s")

    async def log_tts_output(self, frame):
        self.timestamps['tts_first_byte'] = time.time()
        print(f"TTS TTFB: {self.timestamps['tts_first_byte'] - self.timestamps.get('llm_complete', 0):.3f}s")

Pipecat VAD tuning for 2-5s delay issue:

from pipecat.vad.silero import SileroVADAnalyzer

vad = SileroVADAnalyzer(
    min_silence_duration_ms=400,  # Reduce from default 700ms
    speech_pad_ms=100,
    threshold=0.5  # Adjust sensitivity
)

Framework-Agnostic Diagnostic Patterns

Universal debugging approaches for any voice agent framework:

1. Component boundary logging:

// Log at every boundary
const logBoundary = (component, direction, data) => {
  console.log({
    timestamp: Date.now(),
    component,
    direction, // 'in' or 'out'
    dataSize: JSON.stringify(data).length,
    requestId: currentRequestId
  });
};

// Audio → STT
logBoundary('stt', 'in', { audioChunkSize, format });
// STT → LLM
logBoundary('stt', 'out', { transcript, confidence });
logBoundary('llm', 'in', { promptTokens });
// LLM → TTS
logBoundary('llm', 'out', { responseTokens, content });
logBoundary('tts', 'in', { textLength });
// TTS → Audio
logBoundary('tts', 'out', { audioBytes, duration });

2. Request ID correlation:

// Generate at call start, propagate everywhere
const requestId = `call_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;

// Include in all API calls
sttClient.transcribe(audio, { metadata: { requestId } });
llmClient.complete(prompt, { metadata: { requestId } });
ttsClient.synthesize(text, { metadata: { requestId } });

3. Session replay capability: Store full session data for post-mortem analysis:

  • Audio recordings (both directions)
  • Full transcripts with timestamps
  • LLM prompts and responses
  • All component latencies

Structured Logging and Observability

Essential Voice Agent Metrics

Network layer:

MetricWhat It MeasuresHow to Collect
ICE connection stateConnection establishmentpc.iceConnectionState events
Selected candidate pairConnection type (direct/relay)pc.getStats() candidate-pair
RTTNetwork latencyremote-inbound-rtp.roundTripTime
Packet loss %Network reliability(packetsLost / (packetsReceived + packetsLost)) * 100
JitterPacket timing varianceinbound-rtp.jitter * 1000 (ms)

Media layer:

MetricWhat It MeasuresHow to Collect
Audio levelVolume, silence detectionaudio-level stat or Web Audio API
BitrateAudio qualityΔ bytesSent / Δ time
CodecNegotiated codecSDP or getStats()
Frame dropsProcessing issuesCustom counter on frame processor

Pipeline layer:

MetricWhat It MeasuresHow to Collect
STT TTFBFirst partial latencyfirstPartialTime - audioEndTime
STT RTFProcessing speedprocessingTime / audioDuration
LLM TTFTFirst token latencyfirstTokenTime - requestTime
LLM tokens/secGeneration speedtotalTokens / generationTime
TTS TTFBFirst audio byte latencyfirstByteTime - requestTime
Turn-around timeEnd-to-end responseagentAudioStart - userSpeechEnd

Minimal Viable Logging Schema

Session-level:

{
  "session_id": "sess_abc123",
  "start_time": "2026-01-25T10:30:00Z",
  "end_time": "2026-01-25T10:35:00Z",
  "duration_seconds": 300,
  "participant_count": 2,
  "completion_status": "completed",
  "ice_connection_type": "relay",
  "average_rtt_ms": 45,
  "total_packet_loss_percent": 0.3
}

Turn-level:

{
  "session_id": "sess_abc123",
  "turn_index": 5,
  "user_speech_start": "2026-01-25T10:31:15.000Z",
  "user_speech_end": "2026-01-25T10:31:18.500Z",
  "user_transcript": "I need to reschedule my appointment",
  "asr_confidence": 0.94,
  "stt_latency_ms": 180,
  "llm_first_token_ms": 220,
  "llm_complete_ms": 450,
  "tts_first_byte_ms": 85,
  "turn_around_time_ms": 735,
  "agent_response": "I can help you reschedule...",
  "barge_in_occurred": false,
  "tool_calls": [
    {"name": "get_appointments", "success": true, "latency_ms": 45}
  ]
}

Connection-level (sample every 5-10 seconds):

{
  "session_id": "sess_abc123",
  "timestamp": "2026-01-25T10:31:20.000Z",
  "ice_connection_state": "connected",
  "selected_candidate_type": "relay",
  "rtt_ms": 42,
  "jitter_ms": 8,
  "packets_received": 15420,
  "packets_lost": 12,
  "packet_loss_percent": 0.08,
  "audio_level_db": -25.5
}

Trace Correlation Across Components

Implement distributed tracing with OpenTelemetry:

const { trace, context, propagation } = require('@opentelemetry/api');

// Start trace at call initiation
const tracer = trace.getTracer('voice-agent');
const span = tracer.startSpan('voice_agent_call');
const ctx = trace.setSpan(context.active(), span);

// Propagate context to all services
const headers = {};
propagation.inject(ctx, headers);

// STT call
await sttClient.transcribe(audio, { headers });

// LLM call
await llmClient.complete(prompt, { headers });

// TTS call
await ttsClient.synthesize(text, { headers });

span.end();

This enables: "Which component caused the 2s latency spike in session X?"


Symptom-to-Cause Diagnostic Tables

Network Issues Quick Reference

SymptomLikely CauseDiagnostic StepsFix
ICE state stuck "checking"Firewall blocking UDPCheck STUN Binding responses in WiresharkTry TURN TCP fallback on port 443
No audio either directionMedia connection failedVerify ICE candidate exchange in webrtc-internalsEnsure TURN server configured
One-way audioAsymmetric NAT/firewallCheck inbound/outbound packet countsOpen UDP ports, use TURN relay
TURN Allocation failuresInvalid/expired credentialsCheck for 401/403 errorsRefresh TURN credentials
High RTT (>300ms)Network congestion or routingCompare RTT across connection typesUse closer server, improve network path
Intermittent disconnectsNetwork instabilityCheck ICE restart eventsImplement automatic ICE restart

Audio Quality Issues Quick Reference

SymptomLikely CauseDiagnostic StepsFix
Choppy/robotic voicePacket loss >5% or jitter buffer underrunsCheck packetsLost, jitter in webrtc-internalsImprove network, increase jitter buffer
Echo or feedbackAEC3 failure or device issueTest different browser/deviceUse headphones, check Chrome flags
Audio cuts out intermittentlyNetwork instability or device overloadMonitor packet loss patterns, CPU usageReduce processing, improve network
Degraded audio qualityCodec bitrate too lowCheck selected codec, bitrate statsIncrease bitrate, use Opus codec
Latency >500msCombined network + jitter buffer + processingBreak down RTT, jitter buffer, STT/LLM/TTSOptimize each component

Voice Pipeline Issues Quick Reference

SymptomLikely CauseDiagnostic StepsFix
2-5s delay before agent respondsSTT endpointing timeout or LLM queuingMeasure STT final transcript latency, LLM TTFTReduce silence threshold, check LLM rate limits
Agent cuts off user mid-sentenceAggressive endpointing or low VAD thresholdCheck silence threshold configIncrease to 500-700ms, use phrase endpointing
Agent doesn't stop when interruptedBarge-in detection disabled or latency >200msCheck VAD processing time, AECImprove echo cancellation, tune VAD
Frequent false interruptionsPoor echo cancellation or VAD false positivesCheck TTS audio levels, VAD triggersImprove AEC, increase VAD threshold
High P95 latency spikesService queuing or cold startsMonitor per-component P95 latenciesImplement service warm-up, scale capacity

Tooling Ecosystem Overview

Open Source Debugging Tools

ToolPurposeBest For
chrome://webrtc-internalsWebRTC session inspectionDevelopment debugging, connection issues
about:webrtc (Firefox)Firefox WebRTC debuggingFirefox-specific issues
WiresharkNetwork packet captureSTUN/TURN/RTP protocol analysis
Whisker (Pipecat)Pipeline frame visualizationPipecat frame flow debugging
Tail (Pipecat)Terminal metrics dashboardReal-time Pipecat metrics
LiveKit CLIRoom and participant inspectionLiveKit deployment debugging

Network Analysis Commands

Test STUN connectivity:

# Using turnutils
turnutils_stunclient stun.l.google.com

Test TURN connectivity:

# Using turnutils
turnutils_uclient -u username -w password turn.server.com

Capture WebRTC traffic with Wireshark:

# Filter for STUN protocol
wireshark -f "udp port 3478 or udp port 5349"

# Filter for RTP
wireshark -f "udp portrange 10000-60000"

Commercial Testing Platforms

PlatformCapabilities
HammingLiveKit integration, auto-generated test scenarios, 50+ quality metrics, session replay, drift detection

Testing and Validation Strategies

Automated Testing Checklist

Before deploying voice agent changes:

  • Unit tests: STT/LLM/TTS component mocks
  • Integration tests: Full pipeline with real services
  • Regression tests: Golden call set (50+ recordings)
  • Load tests: Concurrent call capacity
  • Network simulation: Packet loss, jitter injection

Synthetic Test Call Patterns

// Test scenarios to cover
const testScenarios = [
  {
    name: 'clean_audio',
    backgroundNoise: null,
    packetLoss: 0,
    jitter: 0
  },
  {
    name: 'noisy_environment',
    backgroundNoise: 'coffee_shop_-20db',
    packetLoss: 0,
    jitter: 0
  },
  {
    name: 'poor_network',
    backgroundNoise: null,
    packetLoss: 3,  // 3%
    jitter: 30      // 30ms
  },
  {
    name: 'barge_in_test',
    interruptAt: 1500,  // ms into agent response
    expectedStopLatency: 200  // ms
  }
];

Regression Detection Alerts

Set alerts for statistical deviations:

MetricAlert ThresholdWindow
Turn-around time P95+20% from baseline1 hour
Barge-in accuracy-5% from baseline1 hour
Task completion rate-10% from baseline4 hours
Packet loss>2% sustained15 minutes
STT WER+10% from baseline1 hour

Conclusion: Debugging Workflow

Prioritized Debugging Steps

When a voice agent has issues, follow this order:

  1. Network layer first

    • Open chrome://webrtc-internals
    • Verify ICE reaches "connected" state
    • Check selected candidate pair type
    • If ICE fails, stop here—fix network first
  2. Media layer second

    • Check RTP packet loss (<1% target)
    • Check jitter (<20ms target)
    • Check RTT (<150ms one-way target)
    • If degraded, fix network or adjust jitter buffer
  3. Pipeline layer third

    • Measure STT latency (TTFB + final)
    • Measure LLM latency (TTFT + complete)
    • Measure TTS latency (TTFB)
    • Identify which component is slowest
  4. Conversation layer fourth

    • Check barge-in detection and response
    • Check endpointing configuration
    • Check turn-taking behavior

Building Observability from Day One

Design for debuggability:

  • Implement structured logging at component boundaries
  • Generate and propagate request IDs through entire pipeline
  • Store session recordings with full transcripts
  • Collect RTP statistics via getStats() API
  • Track latency percentiles (P50/P95/P99), not just averages
  • Implement health checks at each pipeline stage
  • Set up alerts for threshold violations

Next Steps


How Hamming Helps with WebRTC Voice Agent Debugging

Hamming provides specialized tooling for debugging and testing WebRTC voice agents:

  • LiveKit-to-LiveKit Testing: Auto-provisioned rooms, synthetic test calls, real WebRTC connections
  • 50+ Quality Metrics: Latency breakdown, barge-in accuracy, task completion, audio quality
  • Session Replay: Full audio playback with transcripts and component traces
  • Regression Detection: Automated alerts when metrics deviate from baseline
  • Scenario Generation: Auto-generate test cases from prompts, execute in <10 minutes

Instead of manually debugging with chrome://webrtc-internals, get automated visibility into every layer of your voice agent stack.

Debug your voice agents with Hamming →

Frequently Asked Questions

Production networks have NAT, firewalls, and restrictive UDP policies that don't exist in local development. The most common issue is ICE negotiation failing due to blocked STUN/TURN servers. Always test with restricted networks, verify TURN server connectivity, and check chrome://webrtc-internals for ICE state transitions. Based on Hamming's data, 70% of production failures are network-layer issues, not AI pipeline problems.

Open chrome://webrtc-internals BEFORE starting your session, then check three things in order: (1) ICE connection state - must reach 'connected', (2) candidate pairs - verify srflx or relay candidates exist, (3) RTP stats - confirm packets are being sent/received. If ICE never connects, it's a network issue. If packets aren't flowing, it's a media configuration issue. Only after these pass should you debug the AI pipeline.

Based on Hamming's analysis of 1M+ production calls, P50 end-to-end latency is ~1.5 seconds and P95 is ~5 seconds. This is significantly higher than theoretical targets but still provides acceptable user experience. The breakdown is typically: audio capture (70ms) + STT (350ms) + LLM (600-1000ms) + TTS (100ms) + network hops (100ms). Focus on keeping P95 under 5 seconds rather than chasing sub-second P50.

One-way audio means ICE connected but RTP packets flow in only one direction. Check webrtc-internals for packet counts: if outbound-rtp.packetsSent > 0 but inbound-rtp.packetsReceived = 0, the remote side isn't sending or packets are blocked by NAT/firewall. This is almost always a TURN server issue - verify both sides can reach the TURN server and credentials are valid.

Monitor three key metrics in webrtc-internals: packetsLost (target <1%), jitter (target <20ms), and jitterBufferDelay. Packet loss >5% causes severe degradation. Common causes: network congestion, insufficient bandwidth, or aggressive firewall throttling. Enable FEC (Forward Error Correction) in Opus codec, increase jitter buffer size, and consider reducing audio bitrate if bandwidth is constrained.

ICE stuck in 'gathering' means the browser can't reach STUN/TURN servers. This happens when: (1) STUN server URLs are incorrect or servers are down, (2) Corporate firewall blocks UDP port 3478 for STUN, (3) TURN credentials expired or are invalid. Test STUN connectivity with 'nc -u stun.l.google.com 19302' and always configure TURN as fallback with proper credentials.

Barge-in requires coordinating VAD, echo cancellation, and pipeline state. When user speech is detected during TTS playback: (1) immediately stop TTS audio, (2) clear audio buffers to prevent overlap, (3) mark context as 'interrupted' for the LLM, (4) increase VAD sensitivity temporarily. Production data shows barge-in detection latency P50 ~200ms and agent stop latency P50 ~300ms are acceptable.

Framework-specific latency often comes from suboptimal configuration. For LiveKit: ensure agent and server are in the same region, use auto-subscribe for tracks, and implement proper async handling. For Daily/Pipecat: minimize frame processing overhead, use streaming STT/TTS, and avoid blocking operations in the pipeline. Profile with framework-specific tools (Whisker for Pipecat, LiveKit's built-in metrics).

Host candidates are local IP addresses (only work on same network). Srflx (server reflexive) candidates are your public IP from STUN, work for most direct connections. Relay candidates route through TURN servers, adding ~30ms latency but working through any NAT/firewall. In production, 40% of calls require relay candidates. Always configure TURN servers as fallback.

Use automated WebRTC testing platforms like Hamming that simulate real network conditions, generate test scenarios from prompts, and measure 50+ quality metrics. Key capabilities needed: LiveKit-to-LiveKit or Daily-to-Daily connections, network condition simulation (packet loss, jitter), automated speech generation, and latency breakdown by component. Manual chrome://webrtc-internals debugging doesn't scale beyond 10 test cases.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”