TL;DR: Debug WebRTC voice agents using this 3-layer diagnostic approach:
| Layer | What to Check | Key Tools | Production Thresholds (Hamming data) |
|---|---|---|---|
| Network | ICE connection, STUN/TURN, firewall | chrome://webrtc-internals, Wireshark | ICE state: "connected", RTT <150ms |
| Media | RTP packet loss, jitter, audio quality | webrtc-internals stats, getStats API | Packet loss <1%, jitter <20ms |
| Pipeline | STT/LLM/TTS latency, barge-in, turn detection | Component traces, Whisker/Tail | End-to-end P50 ~1.5s, P95 ~5s, barge-in <500ms |
Start at the Network layer. If ICE never reaches "connected", nothing else matters. If RTP stats show packet loss >5%, audio quality degrades before it reaches your AI pipeline. Only debug STT/LLM/TTS after verifying network and media are healthy.
Related Guides:
- Voice Agent Troubleshooting Guide — Complete diagnostic checklist for ASR, LLM, TTS, and tool failures
- Voice Agent Incident Response Runbook — 4-Stack framework for production outages
- Voice Agent Observability & Tracing — OpenTelemetry tracing for voice pipelines
- How to Evaluate and Test Voice Agents — 4-Layer QA Framework
- Voice Agent Evaluation Metrics Guide — Metrics library with formulas and benchmarks
Quick Symptom Lookup
Jump to the section that matches your issue:
| Symptom | Likely Cause | Go To Section |
|---|---|---|
| ICE state stuck on "checking" | Firewall blocking UDP | ICE Connection Failures |
| No audio either direction | Media connection failed | STUN/TURN Configuration |
| One-way audio | Asymmetric NAT/firewall | One-Way Audio Diagnosis |
| Choppy/robotic voice | Packet loss or jitter | RTP Media Quality |
| 2-5 second response delay | STT endpointing or LLM queuing | Pipeline Latency Breakdown |
| Agent doesn't stop when interrupted | Barge-in detection issue | Barge-In Handling |
| Agent cuts off user mid-sentence | Aggressive endpointing | Turn Detection Issues |
WebRTC Debugging Fundamentals
Voice agents built on WebRTC require debugging across multiple layers that traditional application monitoring doesn't cover: network traversal (ICE/STUN/TURN), media transport (RTP/jitter/packet loss), and the AI pipeline (STT/LLM/TTS latency). Engineers face limited visibility compared to traditional systems—no standard observability for interruption patterns, ASR drift, or orchestration delays.
This guide provides symptom-to-cause diagnostics, structured logging patterns, and framework-specific debugging for LiveKit and Pipecat implementations.
Browser-Based Debugging with chrome://webrtc-internals
chrome://webrtc-internals is the most comprehensive tool for debugging WebRTC connections in real-time.
How to use it:
- Open
chrome://webrtc-internalsin Chrome (v87+) before starting your voice agent session - Start the voice agent call—connection data will populate automatically
- Look for the peer connection entry and expand it to see:
- ICE candidate pairs and connection state
- Inbound/outbound RTP statistics (packet loss, jitter, RTT)
- Audio/video track statistics
Critical: Open webrtc-internals before the call starts. Connection establishment data (ICE gathering, candidate exchange) is only captured if the tab is open before the session begins.
Key sections to check:
| Section | What It Shows | What to Look For |
|---|---|---|
| ICE Candidate Pairs | Connection attempts | Selected pair should show "succeeded" |
| Inbound RTP (audio) | Incoming audio stats | packetsLost, jitter, roundTripTime |
| Outbound RTP (audio) | Outgoing audio stats | packetsSent, bytesSent |
| Connection State | ICE state transitions | Should reach "connected" or "completed" |
Firefox alternative: Use about:webrtc for similar functionality, though with fewer features than Chrome.
Production monitoring: chrome://webrtc-internals is impractical outside development. For production, implement client-side event tracing using the getStats() API to collect RTP statistics programmatically.
Understanding WebRTC Connection Architecture
WebRTC uses the Interactive Connectivity Establishment (ICE) framework to establish peer connections through firewalls:
┌─────────────────────────────────────────────────────────────────┐
│ ICE Connection Flow │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Client A Client B │
│ │ │ │
│ │ 1. Gather ICE candidates │ │
│ │ (host, srflx, relay) │ │
│ │ │ │
│ │ 2. Exchange candidates via signaling │ │
│ │ ─────────────────────────────────────────────>│ │
│ │<─────────────────────────────────────────────│ │
│ │ │ │
│ │ 3. Connectivity checks (STUN binding) │ │
│ │<─────────────────────────────────────────────>│ │
│ │ │ │
│ │ 4. Select best candidate pair │ │
│ │═══════════════════════════════════════════════│ │
│ │ Media flows directly or via TURN │ │
│ │
└─────────────────────────────────────────────────────────────────┘
ICE candidate types:
| Candidate Type | How It's Obtained | When It Works |
|---|---|---|
| host | Local network interface | Same network only |
| srflx (server reflexive) | STUN server response | ~80% of NAT configurations |
| relay | TURN server allocation | Always works (adds latency) |
STUN (Session Traversal Utilities for NAT): Discovers your public IP address and NAT type. Works for most home/office NATs but fails with Symmetric NAT.
TURN (Traversal Using Relays around NAT): Relays all media through the server. Required when Symmetric NAT blocks direct connections (~20% of cases). Adds 20-50ms latency overhead.
SDP Negotiation and Signaling
Session Description Protocol (SDP) describes the multimedia session: codecs, formats, encryption parameters.
Offer/Answer flow:
- Caller creates SDP offer with supported codecs
- Offer sent via signaling channel (WebSocket, HTTP)
- Callee creates SDP answer, accepting/rejecting codecs
- Answer sent back via signaling
- Both sides set local/remote descriptions
- ICE gathering begins after local description is set
Common SDP negotiation failures:
| Error | Cause | Fix |
|---|---|---|
InvalidStateError | Setting description in wrong state | Check signalingState before calling setLocalDescription/setRemoteDescription |
| Codec mismatch | Answerer doesn't support offered codecs | Ensure both sides support at least one common codec (Opus recommended for audio) |
| ICE gathering never starts | Local description not set | Call setLocalDescription() with the offer/answer |
Signaling state transitions:
stable → have-local-offer → have-remote-answer → stable (caller)
stable → have-remote-offer → have-local-answer → stable (callee)
Monitor signalingState changes—unexpected states indicate negotiation problems.
Network and Connectivity Troubleshooting
ICE Connection Failure Diagnosis
Symptom: ICE connection state stuck on "checking" or transitions to "failed"
Diagnostic steps:
-
Check ICE candidate gathering in webrtc-internals:
- Look for
iceGatheringState→ should reach "complete" - Check
iceCandidatesarray for all three types (host, srflx, relay) - Missing srflx candidates = STUN server unreachable
- Missing relay candidates = TURN server unreachable or bad credentials
- Look for
-
Check ICE connection state transitions:
new → checking → connected → completed ← Success new → checking → failed ← Failure -
Examine candidate pairs:
- Look for pairs with state "succeeded"
- If all pairs show "failed", connectivity is blocked
ICE failure causes and fixes:
| ICE State | Likely Cause | Diagnostic | Fix |
|---|---|---|---|
| Stuck on "gathering" | STUN/TURN unreachable | Check network connectivity to STUN/TURN servers | Verify server URLs, check firewall |
| Stuck on "checking" | All candidates blocked | Check if any candidate pair attempts | Open UDP ports 3478, 5349, 10000-60000 |
| Transitions to "failed" | No successful connectivity check | Check for STUN binding failures | Use TURN as fallback, check credentials |
| Reaches "connected" then "failed" | Connection dropped | Check network stability | Implement ICE restart on failure |
STUN/TURN Configuration Verification
Verify STUN server connectivity:
// Test STUN binding request
const pc = new RTCPeerConnection({
iceServers: [{ urls: 'stun:stun.l.google.com:19302' }]
});
pc.onicecandidate = (e) => {
if (e.candidate) {
console.log('Candidate type:', e.candidate.type);
// 'srflx' = STUN working
// 'host' only = STUN not working
}
};
pc.createDataChannel('test');
pc.createOffer().then(offer => pc.setLocalDescription(offer));
Verify TURN server connectivity:
const pc = new RTCPeerConnection({
iceServers: [{
urls: 'turn:your-turn-server.com:3478',
username: 'your-username',
credential: 'your-credential'
}],
iceTransportPolicy: 'relay' // Force TURN only
});
pc.onicecandidate = (e) => {
if (e.candidate && e.candidate.type === 'relay') {
console.log('TURN working: relay candidate gathered');
}
};
TURN allocation failure causes:
| Error | Cause | Fix |
|---|---|---|
| 401 Unauthorized | Invalid credentials | Check username/credential, verify not expired |
| 403 Forbidden | Credential expired | Refresh TURN credentials (typically expire in 24h) |
| Connection timeout | Server unreachable | Check firewall, try TCP fallback (port 443) |
| No relay candidates | TURN disabled or misconfigured | Verify iceServers configuration |
Wireshark debugging:
# Capture STUN/TURN traffic
wireshark -f "udp port 3478 or udp port 5349 or tcp port 443"
Look for:
- STUN Binding Request → Binding Response (success)
- TURN Allocate Request → Allocate Response (success)
- TURN CreatePermission → CreatePermission Response
Firewall and NAT Traversal Problems
Symptom: Connection works on some networks but fails on others
Corporate firewall diagnosis:
| Behavior | Likely Block | Workaround |
|---|---|---|
| No srflx candidates | UDP 3478 blocked | Use TURN over TCP/443 |
| No relay candidates | TURN ports blocked | Use TURN over TCP/443 (TLS) |
| ICE fails after gathering | Outbound UDP blocked | Configure TURN with transport=tcp |
NAT type diagnosis:
| NAT Type | Direct Connection | STUN Works | TURN Required |
|---|---|---|---|
| Full Cone | Yes | Yes | No |
| Restricted Cone | Sometimes | Yes | Sometimes |
| Port Restricted | Sometimes | Yes | Sometimes |
| Symmetric | No | No | Yes |
Symmetric NAT detection: If srflx candidates gather but ICE connectivity checks fail, suspect Symmetric NAT. Force TURN relay:
const pc = new RTCPeerConnection({
iceServers: [
{ urls: 'stun:stun.l.google.com:19302' },
{ urls: 'turn:your-server.com:443?transport=tcp', username: '...', credential: '...' }
]
});
RTP Media Quality Analysis
Understanding RTP Statistics
Once ICE connection succeeds, audio flows via RTP. Use getStats() API to monitor quality:
// Collect RTP statistics every 2 seconds
setInterval(async () => {
const stats = await peerConnection.getStats();
stats.forEach(report => {
if (report.type === 'inbound-rtp' && report.kind === 'audio') {
console.log('Packets received:', report.packetsReceived);
console.log('Packets lost:', report.packetsLost);
console.log('Jitter (ms):', report.jitter * 1000);
}
if (report.type === 'remote-inbound-rtp' && report.kind === 'audio') {
console.log('Round trip time (ms):', report.roundTripTime * 1000);
}
});
}, 2000);
Key RTP metrics from webrtc-internals:
| Metric | Location in webrtc-internals | Calculation |
|---|---|---|
| Packet Loss | packetsLost / (packetsReceived + packetsLost) | Percentage of lost packets |
| Jitter | jitter field (in seconds) | Variation in packet arrival times |
| RTT | roundTripTime or currentRoundTripTime | Network latency (round trip) |
| Bitrate | ΔbytesSent / Δtime | Throughput in kbps |
Voice Quality Thresholds
Production thresholds for voice agents (based on Hamming's analysis of 1M+ calls):
| Metric | Excellent | Good (P50) | Acceptable (P95) | Poor |
|---|---|---|---|---|
| End-to-end latency | <1s | ~1.5s | ~5s | >8s |
| Network RTT | <50ms | <100ms | <200ms | >300ms |
| Jitter | <10ms | <20ms | <50ms | >50ms |
| Packet loss | <0.5% | <1% | <3% | >5% |
| MOS Score | 4.3+ | 4.0+ | 3.5+ | <3.5 |
Impact of threshold violations:
| Violation | User Experience |
|---|---|
| Latency >300ms | Conversation feels delayed, users talk over each other |
| Latency >500ms | Communication becomes disjointed, unusable for real-time |
| Jitter >50ms | Audio becomes choppy, words cut off |
| Packet loss >3% | Robotic voice, missing syllables |
| Packet loss >10% | Unintelligible audio |
Jitter Buffer Analysis
Jitter buffers smooth out packet arrival variations but add latency:
| Buffer Type | Latency Added | Best For |
|---|---|---|
| Fixed (100ms) | 100ms constant | Stable networks, low latency priority |
| Fixed (200ms) | 200ms constant | Moderate jitter tolerance |
| Adaptive (100-500ms) | Variable | Variable network conditions |
Jitter buffer underrun symptoms:
- Audio plays in bursts with gaps
- Words cut off mid-syllable
- Robotic or stuttering speech
Check jitter buffer health in webrtc-internals:
- Look for
jitterBufferDelayandjitterBufferEmittedCount - Calculate average delay:
jitterBufferDelay / jitterBufferEmittedCount * 1000(ms)
Packet Loss Patterns
Bursty vs. random packet loss:
| Pattern | Appearance | Likely Cause | Fix |
|---|---|---|---|
| Bursty | Consecutive packets lost | Network congestion, buffer overflow | Reduce bitrate, enable FEC |
| Random | Scattered losses | Weak signal, interference | Improve network path, use wired connection |
| Periodic | Regular intervals | Network equipment issue | Check routers, switches |
Check packet loss direction:
inbound-rtp.packetsLost= packets lost coming TO youremote-inbound-rtp.packetsLost= packets lost going FROM you
Asymmetric loss points to one-way network issues (different upload/download paths).
Voice Agent Pipeline Debugging
STT/LLM/TTS Latency Breakdown
Voice agent response latency accumulates across the pipeline:
┌─────────────────────────────────────────────────────────────────────────┐
│ Voice Agent Latency Breakdown (Production Reality) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ User speaks Audio STT LLM TTS Audio │
│ │ capture process inference synthesis playback │
│ │ │ │ │ │ │ │
│ │◄─ 70ms ──►│◄─350ms ─►│◄600-1000ms►│◄─100ms ──►│◄─ ~10ms ►│ │
│ │ │ │ │ │ │ │
│ └───────────┴──────────┴───────────┴───────────┴──────────┘ │
│ │
│ Total: ~1.2-1.6s + network hops (~10ms each × 10 hops = ~100ms) │
│ Production metrics: P50 ~1.5s, P95 ~5s (Hamming data from 1M+ calls)│
│ │
└─────────────────────────────────────────────────────────────────────────┘
Component latency targets (based on Hamming production data from 1M+ calls):
| Component | P50 Reality | P95 Reality | Critical Threshold |
|---|---|---|---|
| Audio capture/buffering | 50-70ms | 100-150ms | >200ms |
| STT (TTFB) | 200-250ms | 400-500ms | >800ms |
| STT (final transcript) | 300-350ms | 600-700ms | >1000ms |
| LLM (first token) | 400-600ms | 1500-2000ms | >3000ms |
| LLM (complete) | 600-1000ms | 2000-3000ms | >5000ms |
| TTS (first byte) | 80-100ms | 150-200ms | >400ms |
| TTS (complete) | 100-150ms | 200-300ms | >500ms |
| End-to-end total | ~1.5s | ~5s | >8s |
Measuring Component-Level Latency
Track these milestones per turn:
const turnMetrics = {
// Audio
userSpeechStart: null, // VAD detects speech
userSpeechEnd: null, // VAD detects silence (endpointing)
// STT
sttRequestStart: null, // Audio sent to STT
sttFirstPartial: null, // First partial transcript received
sttFinalTranscript: null, // Final transcript received
// LLM
llmRequestStart: null, // Prompt sent to LLM
llmFirstToken: null, // First token received
llmComplete: null, // Full response received
// TTS
ttsRequestStart: null, // Text sent to TTS
ttsFirstByte: null, // First audio byte received
ttsComplete: null, // Full audio received
// Playback
audioPlaybackStart: null, // Audio playback begins
};
// Calculate latencies
const sttLatency = turnMetrics.sttFinalTranscript - turnMetrics.userSpeechEnd;
const llmLatency = turnMetrics.llmComplete - turnMetrics.llmRequestStart;
const ttsLatency = turnMetrics.ttsFirstByte - turnMetrics.ttsRequestStart;
const turnAroundTime = turnMetrics.audioPlaybackStart - turnMetrics.userSpeechEnd;
Report P50/P95/P99 for each milestone. A single blended latency number hides variance—your P50 might be 600ms while P95 is 2000ms.
Barge-In and Interruption Handling
Barge-in = user interrupts while agent is speaking. Critical for natural conversation.
Barge-in requirements:
| Metric | Production P50 | Production P95 | Critical |
|---|---|---|---|
| Detection latency | ~200ms | ~500ms | >800ms |
| Agent stop latency | ~300ms | ~700ms | >1000ms |
| Context retention | 95% | 85% | <80% |
| Recovery rate | >85% | >75% | <70% |
Common barge-in failures:
| Symptom | Cause | Fix |
|---|---|---|
| Agent keeps talking | VAD not detecting speech over TTS audio | Improve echo cancellation, lower VAD threshold |
| Agent stops for background noise | VAD false positives | Increase VAD threshold, add noise filtering |
| Agent stops for "mm-hmm" | Can't distinguish backchannel from interruption | Implement backchannel detection model |
| Agent loses context after interruption | State not preserved | Store partial response, resume gracefully |
Debug barge-in:
// Log barge-in events
voiceAgent.on('bargeIn', (event) => {
console.log({
detectionLatency: event.detectedAt - event.userSpeechStart,
agentStopLatency: event.agentStoppedAt - event.detectedAt,
agentWasSpeaking: event.agentAudioPosition,
userTranscript: event.interruptingUtterance
});
});
Turn Detection and Endpointing Issues
Endpointing = determining when the user finished speaking.
Endpointing tradeoffs:
| Setting | Pros | Cons |
|---|---|---|
| Short silence threshold (300ms) | Fast response | Cuts off mid-thought |
| Long silence threshold (800ms) | Complete utterances | Sluggish feel |
| Phrase endpointing | Natural sentence boundaries | Complexity, model latency |
Common endpointing failures:
| Symptom | Cause | Fix |
|---|---|---|
| Agent responds too early | Silence threshold too short | Increase to 500-700ms |
| Agent cuts off user | Not detecting speech continuation | Use phrase endpointing, longer threshold |
| Long pause before response | Silence threshold too long | Decrease to 400-500ms |
| Inconsistent timing | Static threshold for all contexts | Implement adaptive endpointing |
Debug endpointing:
// Log endpointing decisions
speechRecognizer.on('endOfUtterance', (event) => {
console.log({
silenceDuration: event.silenceDurationMs,
threshold: event.configuredThresholdMs,
transcriptLength: event.transcript.length,
confidence: event.confidence,
wasInterrupted: event.interrupted
});
});
Audio Quality Failure Modes
One-Way Audio Diagnosis
Symptom: One participant hears audio, the other hears nothing.
Diagnostic checklist:
- Check webrtc-internals for packet counts
outbound-rtp.packetsSent> 0? You're sending.inbound-rtp.packetsReceived> 0? You're receiving.
- Zero inbound + non-zero outbound = remote side not sending or packets blocked
- Non-zero inbound + zero outbound = local side not sending or packets blocked
One-way audio causes:
| Pattern | Cause | Fix |
|---|---|---|
| A hears B, B doesn't hear A | A's outbound blocked or B's inbound blocked | Check A's firewall, NAT |
| A doesn't hear B, B hears A | B's outbound blocked or A's inbound blocked | Check B's firewall, NAT |
| Both have packets, still one-way | Codec mismatch | Verify same codec negotiated |
| Intermittent one-way | Network instability | Check for packet loss spikes |
Echo and Feedback Issues
Echo cancellation (AEC3) is critical for voice agents—the agent's TTS audio must not trigger its own VAD.
Echo symptoms:
| Symptom | Cause | Fix |
|---|---|---|
| User hears themselves | AEC not working | Check browser AEC settings, use headphones |
| Agent triggered by its own voice | TTS audio feeding back to microphone | Improve echo cancellation, mute mic during playback |
| Degraded audio after Chrome update | AEC3 experiment changes | Check Chrome flags, try different browser |
Debug echo issues:
// Check audio processing settings
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
}
});
// Verify settings applied
const track = stream.getAudioTracks()[0];
console.log(track.getSettings());
Choppy or Robotic Voice
Symptom: Audio plays in bursts, sounds mechanical.
Diagnostic steps:
-
Check packet loss in webrtc-internals
-
5% packet loss = severe degradation
-
10% packet loss = nearly unintelligible
-
-
Check jitter buffer underruns
- Look for gaps in audio level meter
- Check
jitterBufferDelaytrending
-
Check CPU usage
- High CPU can cause frame drops
- Check
totalProcessingDelaystat
Choppy audio causes and fixes:
| Cause | How to Identify | Fix |
|---|---|---|
| High packet loss | packetsLost increasing | Improve network, enable FEC |
| High jitter | jitter > 50ms | Increase jitter buffer, improve network |
| CPU overload | High CPU in task manager | Reduce processing, disable video |
| Codec issues | Low bitrate, compression artifacts | Increase bitrate, use Opus |
Framework-Specific Debugging
LiveKit Voice Agent Debugging
LiveKit provides a real-time framework for production-grade multimodal voice agents with WebRTC media server.
LiveKit-specific debugging tools:
| Tool | Purpose | How to Access |
|---|---|---|
| LiveKit CLI | Room inspection, participant stats | livekit-cli room list, livekit-cli room inspect |
| Room Composite | Debug recordings | Enable egress for room recordings |
| Webhook events | Connection lifecycle | Configure webhook endpoint |
| Agent logs | Pipeline debugging | LIVEKIT_LOG_LEVEL=debug |
Debug LiveKit agent pipeline:
# Enable verbose logging in LiveKit agent
import logging
logging.basicConfig(level=logging.DEBUG)
from livekit.agents import JobContext, WorkerOptions, cli
async def entrypoint(ctx: JobContext):
# Log connection state
ctx.room.on("connection_state_changed", lambda state:
print(f"Connection state: {state}"))
# Log participant events
ctx.room.on("participant_connected", lambda p:
print(f"Participant connected: {p.identity}"))
# Log track subscriptions
ctx.room.on("track_subscribed", lambda track, publication, participant:
print(f"Track subscribed: {track.kind} from {participant.identity}"))
LiveKit connection issues:
| Symptom | Cause | Fix |
|---|---|---|
| Agent doesn't connect | Room token invalid | Check token expiry, room name |
| Audio not received | Track not subscribed | Verify auto-subscribe or manual subscription |
| High latency | Server region | Deploy agent in same region as server |
| Connection drops | Network instability | Implement reconnection logic |
Hamming integration for LiveKit: Hamming offers LiveKit-to-LiveKit WebRTC testing: auto-provisioned rooms, scenario generation from prompts, 50+ quality metrics evaluated in <10 minutes.
Pipecat Pipeline Troubleshooting
Pipecat specializes in real-time voice agent infrastructure with STT/LLM/TTS orchestration.
Pipecat debugging tools:
| Tool | Purpose | Usage |
|---|---|---|
| Whisker | Real-time pipeline debugger | Visualizes frame flow through pipeline |
| Tail | Terminal metrics dashboard | Monitors latency, token usage in real-time |
Common Pipecat issues:
| Symptom | Cause | Fix |
|---|---|---|
| 2-5s response delay | STT endpointing timeout | Adjust vad_parameters.min_silence_duration_ms |
| Delayed response | LLM queuing | Check LLM rate limits, implement streaming |
| Audio cuts out | Frame processor error | Check pipeline error handlers |
| Memory growth | Frame accumulation | Implement proper frame lifecycle |
Debug Pipecat pipeline latency:
from pipecat.pipeline import Pipeline
from pipecat.services.deepgram import DeepgramSTT
from pipecat.services.openai import OpenAILLMService
from pipecat.services.cartesia import CartesiaTTS
import time
class LatencyLogger:
def __init__(self):
self.timestamps = {}
async def log_stt_output(self, frame):
self.timestamps['stt_complete'] = time.time()
print(f"STT latency: {self.timestamps['stt_complete'] - self.timestamps.get('speech_end', 0):.3f}s")
async def log_llm_output(self, frame):
self.timestamps['llm_first_token'] = time.time()
print(f"LLM TTFT: {self.timestamps['llm_first_token'] - self.timestamps.get('stt_complete', 0):.3f}s")
async def log_tts_output(self, frame):
self.timestamps['tts_first_byte'] = time.time()
print(f"TTS TTFB: {self.timestamps['tts_first_byte'] - self.timestamps.get('llm_complete', 0):.3f}s")
Pipecat VAD tuning for 2-5s delay issue:
from pipecat.vad.silero import SileroVADAnalyzer
vad = SileroVADAnalyzer(
min_silence_duration_ms=400, # Reduce from default 700ms
speech_pad_ms=100,
threshold=0.5 # Adjust sensitivity
)
Framework-Agnostic Diagnostic Patterns
Universal debugging approaches for any voice agent framework:
1. Component boundary logging:
// Log at every boundary
const logBoundary = (component, direction, data) => {
console.log({
timestamp: Date.now(),
component,
direction, // 'in' or 'out'
dataSize: JSON.stringify(data).length,
requestId: currentRequestId
});
};
// Audio → STT
logBoundary('stt', 'in', { audioChunkSize, format });
// STT → LLM
logBoundary('stt', 'out', { transcript, confidence });
logBoundary('llm', 'in', { promptTokens });
// LLM → TTS
logBoundary('llm', 'out', { responseTokens, content });
logBoundary('tts', 'in', { textLength });
// TTS → Audio
logBoundary('tts', 'out', { audioBytes, duration });
2. Request ID correlation:
// Generate at call start, propagate everywhere
const requestId = `call_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;
// Include in all API calls
sttClient.transcribe(audio, { metadata: { requestId } });
llmClient.complete(prompt, { metadata: { requestId } });
ttsClient.synthesize(text, { metadata: { requestId } });
3. Session replay capability: Store full session data for post-mortem analysis:
- Audio recordings (both directions)
- Full transcripts with timestamps
- LLM prompts and responses
- All component latencies
Structured Logging and Observability
Essential Voice Agent Metrics
Network layer:
| Metric | What It Measures | How to Collect |
|---|---|---|
| ICE connection state | Connection establishment | pc.iceConnectionState events |
| Selected candidate pair | Connection type (direct/relay) | pc.getStats() candidate-pair |
| RTT | Network latency | remote-inbound-rtp.roundTripTime |
| Packet loss % | Network reliability | (packetsLost / (packetsReceived + packetsLost)) * 100 |
| Jitter | Packet timing variance | inbound-rtp.jitter * 1000 (ms) |
Media layer:
| Metric | What It Measures | How to Collect |
|---|---|---|
| Audio level | Volume, silence detection | audio-level stat or Web Audio API |
| Bitrate | Audio quality | Δ bytesSent / Δ time |
| Codec | Negotiated codec | SDP or getStats() |
| Frame drops | Processing issues | Custom counter on frame processor |
Pipeline layer:
| Metric | What It Measures | How to Collect |
|---|---|---|
| STT TTFB | First partial latency | firstPartialTime - audioEndTime |
| STT RTF | Processing speed | processingTime / audioDuration |
| LLM TTFT | First token latency | firstTokenTime - requestTime |
| LLM tokens/sec | Generation speed | totalTokens / generationTime |
| TTS TTFB | First audio byte latency | firstByteTime - requestTime |
| Turn-around time | End-to-end response | agentAudioStart - userSpeechEnd |
Minimal Viable Logging Schema
Session-level:
{
"session_id": "sess_abc123",
"start_time": "2026-01-25T10:30:00Z",
"end_time": "2026-01-25T10:35:00Z",
"duration_seconds": 300,
"participant_count": 2,
"completion_status": "completed",
"ice_connection_type": "relay",
"average_rtt_ms": 45,
"total_packet_loss_percent": 0.3
}
Turn-level:
{
"session_id": "sess_abc123",
"turn_index": 5,
"user_speech_start": "2026-01-25T10:31:15.000Z",
"user_speech_end": "2026-01-25T10:31:18.500Z",
"user_transcript": "I need to reschedule my appointment",
"asr_confidence": 0.94,
"stt_latency_ms": 180,
"llm_first_token_ms": 220,
"llm_complete_ms": 450,
"tts_first_byte_ms": 85,
"turn_around_time_ms": 735,
"agent_response": "I can help you reschedule...",
"barge_in_occurred": false,
"tool_calls": [
{"name": "get_appointments", "success": true, "latency_ms": 45}
]
}
Connection-level (sample every 5-10 seconds):
{
"session_id": "sess_abc123",
"timestamp": "2026-01-25T10:31:20.000Z",
"ice_connection_state": "connected",
"selected_candidate_type": "relay",
"rtt_ms": 42,
"jitter_ms": 8,
"packets_received": 15420,
"packets_lost": 12,
"packet_loss_percent": 0.08,
"audio_level_db": -25.5
}
Trace Correlation Across Components
Implement distributed tracing with OpenTelemetry:
const { trace, context, propagation } = require('@opentelemetry/api');
// Start trace at call initiation
const tracer = trace.getTracer('voice-agent');
const span = tracer.startSpan('voice_agent_call');
const ctx = trace.setSpan(context.active(), span);
// Propagate context to all services
const headers = {};
propagation.inject(ctx, headers);
// STT call
await sttClient.transcribe(audio, { headers });
// LLM call
await llmClient.complete(prompt, { headers });
// TTS call
await ttsClient.synthesize(text, { headers });
span.end();
This enables: "Which component caused the 2s latency spike in session X?"
Symptom-to-Cause Diagnostic Tables
Network Issues Quick Reference
| Symptom | Likely Cause | Diagnostic Steps | Fix |
|---|---|---|---|
| ICE state stuck "checking" | Firewall blocking UDP | Check STUN Binding responses in Wireshark | Try TURN TCP fallback on port 443 |
| No audio either direction | Media connection failed | Verify ICE candidate exchange in webrtc-internals | Ensure TURN server configured |
| One-way audio | Asymmetric NAT/firewall | Check inbound/outbound packet counts | Open UDP ports, use TURN relay |
| TURN Allocation failures | Invalid/expired credentials | Check for 401/403 errors | Refresh TURN credentials |
| High RTT (>300ms) | Network congestion or routing | Compare RTT across connection types | Use closer server, improve network path |
| Intermittent disconnects | Network instability | Check ICE restart events | Implement automatic ICE restart |
Audio Quality Issues Quick Reference
| Symptom | Likely Cause | Diagnostic Steps | Fix |
|---|---|---|---|
| Choppy/robotic voice | Packet loss >5% or jitter buffer underruns | Check packetsLost, jitter in webrtc-internals | Improve network, increase jitter buffer |
| Echo or feedback | AEC3 failure or device issue | Test different browser/device | Use headphones, check Chrome flags |
| Audio cuts out intermittently | Network instability or device overload | Monitor packet loss patterns, CPU usage | Reduce processing, improve network |
| Degraded audio quality | Codec bitrate too low | Check selected codec, bitrate stats | Increase bitrate, use Opus codec |
| Latency >500ms | Combined network + jitter buffer + processing | Break down RTT, jitter buffer, STT/LLM/TTS | Optimize each component |
Voice Pipeline Issues Quick Reference
| Symptom | Likely Cause | Diagnostic Steps | Fix |
|---|---|---|---|
| 2-5s delay before agent responds | STT endpointing timeout or LLM queuing | Measure STT final transcript latency, LLM TTFT | Reduce silence threshold, check LLM rate limits |
| Agent cuts off user mid-sentence | Aggressive endpointing or low VAD threshold | Check silence threshold config | Increase to 500-700ms, use phrase endpointing |
| Agent doesn't stop when interrupted | Barge-in detection disabled or latency >200ms | Check VAD processing time, AEC | Improve echo cancellation, tune VAD |
| Frequent false interruptions | Poor echo cancellation or VAD false positives | Check TTS audio levels, VAD triggers | Improve AEC, increase VAD threshold |
| High P95 latency spikes | Service queuing or cold starts | Monitor per-component P95 latencies | Implement service warm-up, scale capacity |
Tooling Ecosystem Overview
Open Source Debugging Tools
| Tool | Purpose | Best For |
|---|---|---|
| chrome://webrtc-internals | WebRTC session inspection | Development debugging, connection issues |
| about:webrtc (Firefox) | Firefox WebRTC debugging | Firefox-specific issues |
| Wireshark | Network packet capture | STUN/TURN/RTP protocol analysis |
| Whisker (Pipecat) | Pipeline frame visualization | Pipecat frame flow debugging |
| Tail (Pipecat) | Terminal metrics dashboard | Real-time Pipecat metrics |
| LiveKit CLI | Room and participant inspection | LiveKit deployment debugging |
Network Analysis Commands
Test STUN connectivity:
# Using turnutils
turnutils_stunclient stun.l.google.com
Test TURN connectivity:
# Using turnutils
turnutils_uclient -u username -w password turn.server.com
Capture WebRTC traffic with Wireshark:
# Filter for STUN protocol
wireshark -f "udp port 3478 or udp port 5349"
# Filter for RTP
wireshark -f "udp portrange 10000-60000"
Commercial Testing Platforms
| Platform | Capabilities |
|---|---|
| Hamming | LiveKit integration, auto-generated test scenarios, 50+ quality metrics, session replay, drift detection |
Testing and Validation Strategies
Automated Testing Checklist
Before deploying voice agent changes:
- Unit tests: STT/LLM/TTS component mocks
- Integration tests: Full pipeline with real services
- Regression tests: Golden call set (50+ recordings)
- Load tests: Concurrent call capacity
- Network simulation: Packet loss, jitter injection
Synthetic Test Call Patterns
// Test scenarios to cover
const testScenarios = [
{
name: 'clean_audio',
backgroundNoise: null,
packetLoss: 0,
jitter: 0
},
{
name: 'noisy_environment',
backgroundNoise: 'coffee_shop_-20db',
packetLoss: 0,
jitter: 0
},
{
name: 'poor_network',
backgroundNoise: null,
packetLoss: 3, // 3%
jitter: 30 // 30ms
},
{
name: 'barge_in_test',
interruptAt: 1500, // ms into agent response
expectedStopLatency: 200 // ms
}
];
Regression Detection Alerts
Set alerts for statistical deviations:
| Metric | Alert Threshold | Window |
|---|---|---|
| Turn-around time P95 | +20% from baseline | 1 hour |
| Barge-in accuracy | -5% from baseline | 1 hour |
| Task completion rate | -10% from baseline | 4 hours |
| Packet loss | >2% sustained | 15 minutes |
| STT WER | +10% from baseline | 1 hour |
Conclusion: Debugging Workflow
Prioritized Debugging Steps
When a voice agent has issues, follow this order:
-
Network layer first
- Open chrome://webrtc-internals
- Verify ICE reaches "connected" state
- Check selected candidate pair type
- If ICE fails, stop here—fix network first
-
Media layer second
- Check RTP packet loss (<1% target)
- Check jitter (<20ms target)
- Check RTT (<150ms one-way target)
- If degraded, fix network or adjust jitter buffer
-
Pipeline layer third
- Measure STT latency (TTFB + final)
- Measure LLM latency (TTFT + complete)
- Measure TTS latency (TTFB)
- Identify which component is slowest
-
Conversation layer fourth
- Check barge-in detection and response
- Check endpointing configuration
- Check turn-taking behavior
Building Observability from Day One
Design for debuggability:
- Implement structured logging at component boundaries
- Generate and propagate request IDs through entire pipeline
- Store session recordings with full transcripts
- Collect RTP statistics via getStats() API
- Track latency percentiles (P50/P95/P99), not just averages
- Implement health checks at each pipeline stage
- Set up alerts for threshold violations
Next Steps
- For LiveKit users: Monitor Pipecat Agents in Production
- For production incidents: Voice Agent Incident Response Runbook
- For comprehensive testing: How to Evaluate and Test Voice Agents
- For metrics deep-dive: Voice Agent Evaluation Metrics Guide
How Hamming Helps with WebRTC Voice Agent Debugging
Hamming provides specialized tooling for debugging and testing WebRTC voice agents:
- LiveKit-to-LiveKit Testing: Auto-provisioned rooms, synthetic test calls, real WebRTC connections
- 50+ Quality Metrics: Latency breakdown, barge-in accuracy, task completion, audio quality
- Session Replay: Full audio playback with transcripts and component traces
- Regression Detection: Automated alerts when metrics deviate from baseline
- Scenario Generation: Auto-generate test cases from prompts, execute in <10 minutes
Instead of manually debugging with chrome://webrtc-internals, get automated visibility into every layer of your voice agent stack.

