Voice Agent Incident Response Runbook: Debug and Fix Failures in Production

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

January 20, 2026Updated January 25, 202617 min read
Voice Agent Incident Response Runbook: Debug and Fix Failures in Production

TL;DR: Respond to voice agent incidents using Hamming's 4-Stack Incident Response Framework:

StackWhat FailedFirst CheckTarget Resolution
1. TelephonyCalls not connectingSIP registration, network<5 min
2. AudioNo sound, garbled, ASR failingCodec, WebRTC, VAD<10 min
3. IntelligenceWrong responses, timeoutsLLM endpoint, prompts<15 min
4. OutputNo agent speech, TTS errorsTTS service, audio encoding<10 min

Start at Stack 1. Move up only when that stack is verified working. Most incidents (50%) are Stack 1 or 2—don't jump to LLM debugging first.

Related Guides:


How to Debug Voice Agents in Production (Step-by-Step)

When a voice agent fails, use this symptom-based approach to quickly identify the root cause:

Symptom → Diagnosis Table

SymptomLikely StackWhere to LookCommon CausesQuick Fix
Agent is slow (>2s response)Stack 3 (LLM) or Stack 2 (ASR)LLM latency traces, ASR processing timeLLM rate limiting, cold starts, complex promptsImplement streaming, reduce prompt length, check provider status
Users talk over agentStack 4 (TTS) or Stack 2 (VAD)TTS latency, VAD threshold, turn detectionTTS too slow, endpointing threshold too high, barge-in not workingReduce TTS latency, lower VAD silence threshold, verify interruption handling
Agent misunderstands intentStack 2 (ASR) or Stack 3 (NLU)WER metrics, intent accuracy, transcript qualityHigh WER from noise/accents, NLU drift, prompt changesCheck ASR provider, review prompt changes, test with different accents
Agent repeats itselfStack 3 (LLM)Conversation history, context windowContext not being passed, infinite loop in dialogVerify conversation history injection, check for circular prompt logic
No audio from agentStack 4 (TTS) or Stack 1 (Network)TTS logs, audio encoding, network tracesTTS service down, codec mismatch, network blockingCheck TTS provider status, verify audio encoding matches telephony
Calls drop immediatelyStack 1 (Telephony)SIP registration, call setup logsSIP trunk down, credential expired, firewall blockingRe-register SIP, rotate credentials, check firewall rules
Agent gives wrong informationStack 3 (LLM)LLM responses, knowledge base, tool callsHallucination, stale knowledge base, failed tool callsAdd validation against sources, update knowledge base, verify tool success
Call connects but no interactionStack 2 (Audio)Audio frames, codec negotiationOne-way audio, codec mismatch, VAD not detecting speechCheck codec compatibility, verify bidirectional audio flow

Minimum Logging Checklist for Voice Agents

To debug voice agents effectively, ensure you're capturing these data points for every call:

Turn-Level Data (per exchange):

  • Timestamps: User speech start/end, ASR complete, LLM start/end, TTS start/end
  • Transcripts: Raw ASR output with confidence scores
  • Intent: Classified intent with confidence and alternatives
  • Latency breakdown: STT ms, LLM ms, TTS ms, total ms

Call-Level Data:

  • Call metadata: Call ID, correlation ID, caller info, agent version
  • Session context: Conversation history passed to LLM
  • Tool calls: Function name, parameters, result, success/failure, latency
  • Outcomes: Task completion status, escalation reason, call duration

Audio Data:

  • Audio quality: MOS score or quality indicators
  • VAD events: Speech detection timestamps, silence durations
  • Barge-in events: Interruption timestamps, recovery success

Example Log Entry (JSON):

{
  "call_id": "call_abc123",
  "turn_index": 3,
  "timestamp": "2026-01-25T10:30:00Z",
  "user_transcript": "I need to reschedule my appointment",
  "asr_confidence": 0.94,
  "intent": {"name": "reschedule_appointment", "confidence": 0.91},
  "latency_ms": {"stt": 180, "llm": 420, "tts": 150, "total": 750},
  "tool_calls": [{"name": "get_appointments", "success": true, "latency_ms": 85}],
  "agent_response": "I can help you reschedule. I see you have an appointment on Tuesday. What date works better?"
}

Pro tip: Use distributed tracing (OpenTelemetry) to correlate logs across services. See Voice Agent Observability & Tracing for implementation guidance.


Your voice agent is down. Calls are failing. Your on-call engineer just got paged at 2 AM.

What do they do first?

Most teams scramble—restarting services, checking logs randomly, hoping something works. Meanwhile, customers get dead air or disconnected calls. Every minute of downtime costs revenue and trust.

At Hamming, we've analyzed 1M+ voice agent calls and helped teams respond to hundreds of production incidents. Here's what separates a 15-minute resolution from a 3-hour firefight: a systematic incident response framework that diagnoses the right stack first.

This runbook gives your on-call team exactly that.

Quick filter: If you're restarting services before understanding which stack failed, you're wasting time.

Methodology Note: The frameworks, thresholds, and resolution times in this runbook are derived from Hamming's analysis of 1M+ voice agent interactions and incident response patterns across 50+ production deployments (2024-2026). Your specific thresholds should be calibrated to your baseline performance.

What Does This Runbook Cover?

This runbook applies to production voice agents using:

  • SIP or WebRTC telephony (Twilio, Vonage, Telnyx, custom)
  • Streaming ASR (Deepgram, AssemblyAI, Whisper, Google STT)
  • LLM orchestration (OpenAI, Anthropic, custom models)
  • TTS services (ElevenLabs, PlayHT, Cartesia, Azure)

Assumes:

  • Agent is deployed and was working previously
  • You have access to logs, metrics, or a monitoring dashboard
  • Basic familiarity with voice agent architecture

Definitions used:

  • Incident: Unplanned degradation affecting call quality or success rate
  • Latency: End-to-end turn latency (user silence → agent audio playback)
  • Failure: Call termination before task completion

Not an active incident? For general troubleshooting and diagnosis, see our Voice Agent Drift Detection Guide. For setting up proactive monitoring to prevent incidents, see Voice Agent Monitoring Platform Guide.

What Is the 4-Stack Voice Agent Architecture?

Voice agents consist of four interdependent stacks. Each stack has distinct failure modes and requires specific diagnostic approaches:

StackFunctionComponentsFailure Mode
1. TelephonyCall connectivitySIP, WebRTC, networkCalls don't connect or drop immediately
2. AudioSound capture & processingCodec, VAD, ASRNo sound, garbled audio, empty transcripts
3. IntelligenceUnderstanding & responseLLM, prompts, toolsWrong responses, timeouts, hallucinations
4. OutputSpeech synthesisTTS, audio encodingNo agent speech, robotic/garbled output

Source: Stack architecture based on Hamming's analysis of 100+ production voice agent deployments (2025-2026). Categorization aligned with standard voice agent architecture patterns.

Key insight: Failures cascade upward. A Stack 1 (Telephony) issue makes everything else irrelevant. A Stack 2 (Audio) issue means the LLM never gets good input. Always start at Stack 1.

How Do You Classify Incident Severity?

Before diving into diagnosis, classify the incident severity to determine response urgency:

SeverityDefinitionUser ImpactResponse Time Target
SEV-1 (Critical)Complete outageNo calls connecting, 100% failure<15 min to mitigate
SEV-2 (Major)Significant degradation>25% calls affected, major feature broken<30 min to mitigate
SEV-3 (Minor)Partial degradation<25% calls affected, edge cases broken<2 hours to mitigate
SEV-4 (Low)Cosmetic or rare issuesMinimal user impactNext business day

Source: Severity classification aligned with Google SRE practices and adapted for voice agent-specific signals.

SEV-1 and SEV-2 require immediate action. SEV-3 and SEV-4 can be scheduled for normal working hours.

What Is the Incident Response Decision Tree?

Use this decision tree to identify which stack to investigate first:

INCIDENT DETECTED
       
       
┌──────────────────────────────────────────────┐
 Can calls connect at all?                    
 (Check: SIP registration, call logs)         
└──────────────────────────────────────────────┘
       
   NO    YES
           
  STACK 1   
  Telephony 
           
         ┌──────────────────────────────────────────────┐
          Is audio flowing both directions?            
          (Check: transcripts, audio recordings)       
         └──────────────────────────────────────────────┘
              
          NO    YES
                  
           STACK 2 
           Audio   
                  
                ┌──────────────────────────────────────────────┐
                 Is agent responding correctly to input?      
                 (Check: LLM logs, response quality)          
                └──────────────────────────────────────────────┘
                     
                 NO    YES
                         
                  STACK 3 
                  LLM     
                         
                       ┌──────────────────────────────────────────────┐
                        Is agent voice output working?               
                        (Check: TTS logs, audio playback)            
                       └──────────────────────────────────────────────┘
                            
                        NO    YES
                                
                         STACK 4 
                         TTS     
                                
                              CROSS-STACK or
                              INTERMITTENT ISSUE

Pro tip: Run through this decision tree in order. Don't skip to Stack 3 (LLM) because it seems more likely—verify each stack first.

How Do You Diagnose Stack 1: Telephony Failures?

Symptoms:

  • Calls don't connect at all
  • Immediate disconnect after dial
  • "Number not reachable" errors
  • SIP 4xx/5xx errors in logs
  • WebRTC ICE connection failures

What Causes Telephony Failures?

CauseLikelihoodHow to Diagnose
SIP trunk downHighCheck SIP registration status in provider dashboard
Network/firewall blockingHighVerify UDP/TCP ports (5060, 5061, 10000-20000) are open
Credential expirationMediumCheck SIP auth errors in logs ("401 Unauthorized")
Provider outageMediumCheck provider status page (Twilio, Telnyx, Vonage)
DNS resolution failureLowTest DNS resolution for SIP domain
Certificate expirationLowCheck TLS cert validity for secure SIP

Stack 1 Diagnostic Checklist

Run through these checks in order:

  • SIP Registration Active?

    • Check provider dashboard for registration status
    • Look for "REGISTER" success in SIP logs
  • Network Connectivity?

    • Ping SIP server: ping sip.provider.com
    • Check firewall rules for SIP ports (5060/5061 TCP/UDP)
    • Verify STUN/TURN servers reachable
  • Provider Status?

  • Recent Changes?

    • Credential rotation?
    • Firewall rule changes?
    • DNS updates?

Stack 1 Resolution Steps

  1. If SIP registration failed: Re-register with provider, verify credentials
  2. If network blocked: Open required ports, check NAT traversal
  3. If provider outage: Failover to backup SIP trunk if available
  4. If credentials expired: Rotate credentials and update configuration

When to Escalate: If SIP registration is active but calls still fail, escalate to Stack 2 (Audio).

How Do You Diagnose Stack 2: Audio Failures?

Symptoms:

  • One-way audio (user hears agent, agent doesn't hear user)
  • "Empty transcript" errors
  • Agent not responding to user speech
  • Garbled or choppy audio
  • VAD not detecting speech

What Causes Audio Failures?

CauseLikelihoodHow to Diagnose
Codec mismatchHighCheck negotiated codec vs. expected (PCMU, PCMA, Opus)
WebRTC ICE failureHighCheck ICE connection state in browser/client logs
VAD threshold too aggressiveMediumCheck silence detection cutting off speech
ASR service degradedMediumCheck ASR provider status, test direct API
Sample rate mismatchLowVerify 16kHz throughout pipeline
Audio buffer overflowLowCheck for dropped frames in audio processing

How Do You Test ASR in a Voice Agent?

Test ASR independently to isolate audio issues:

  1. Check transcription output: Look for recent call transcripts in logs
  2. Verify audio reaching ASR: Look for audio events/frames being sent
  3. Test ASR endpoint directly:
    # Test Deepgram directly (example)
    curl -X POST "https://api.deepgram.com/v1/listen" \
      -H "Authorization: Token YOUR_API_KEY" \
      -H "Content-Type: audio/wav" \
      --data-binary @test-audio.wav
  4. Check Word Error Rate (WER): Target <5% for clean audio, <10% for noisy

Stack 2 Key Metrics

MetricNormalWarningCritical
ASR Latency<300ms300-500ms>500ms
Transcription Confidence>0.850.7-0.85<0.7
Audio Packet Loss<1%1-3%>3%
VAD False Negatives<2%2-5%>5%

Stack 2 Diagnostic Checklist

  • Audio Reaching Server?

    • Check for audio frames in logs
    • Verify WebRTC connection established
  • Codec Negotiated Correctly?

    • Expected: Opus (WebRTC) or PCMU/PCMA (SIP)
    • Mismatch causes garbled audio
  • ASR Returning Transcripts?

    • Check ASR logs for transcription responses
    • Empty transcripts = no audio or VAD issue
  • VAD Configuration?

    • Is silence threshold too aggressive?
    • Is speech being cut off prematurely?
  • ASR Provider Status?

Stack 2 Resolution Steps

  1. If codec mismatch: Reconfigure to match expected codec
  2. If ICE failure: Check STUN/TURN servers, NAT traversal
  3. If VAD too aggressive: Increase speech detection threshold
  4. If ASR degraded: Failover to backup ASR provider if available

When to Escalate: If transcripts look correct but agent responses are wrong, escalate to Stack 3 (Intelligence).

How Do You Diagnose Stack 3: Intelligence Failures?

Symptoms:

  • Agent gives wrong or nonsensical responses
  • Long pauses before agent speaks (>2 seconds)
  • Timeout errors in logs
  • Hallucinated information
  • Tool calls failing silently

What Causes LLM Failures?

CauseLikelihoodHow to Diagnose
LLM rate limitingHighCheck for 429 errors in logs
Prompt corruptionHighReview recent prompt changes, check for injection
Context window overflowMediumCheck token count per turn (approaching limit?)
Model endpoint downMediumDirect API health check to LLM provider
Tool calling failureMediumCheck function call logs, tool timeout errors
Model regressionLowCompare response quality to baseline

Stack 3 Key Metrics

MetricNormalWarningCritical
LLM Response Time<500ms500-1000ms>1000ms
Time to First Token (TTFT)<300ms300-500ms>500ms
Tool Call Success Rate>99%95-99%<95%
Hallucination Rate<5%5-10%>10%

Stack 3 Diagnostic Commands

Test LLM endpoint directly:

# Test OpenAI endpoint
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Say hello"}],
    "max_tokens": 50
  }'

# Expected: Response in <1s, no errors
# Red flags: 429 (rate limited), 500 (server error), timeout
# Test Anthropic endpoint
curl -X POST https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "content-type: application/json" \
  -d '{
    "model": "claude-sonnet-4-20250514",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Say hello"}]
  }'

Stack 3 Diagnostic Checklist

  • LLM Endpoint Responding?

    • Direct API test (see commands above)
    • Check provider status page
  • Rate Limiting?

    • Look for 429 errors
    • Check tokens per minute usage
  • Prompt Changes?

    • Review recent prompt deployments
    • Check for prompt injection in user input
  • Context Window?

    • Calculate tokens per conversation
    • Approaching 128K/200K limit?
  • Tool Calls Working?

    • Check function call logs
    • Are tools timing out?

Stack 3 Resolution Steps

  1. If rate limited: Reduce request rate, implement backoff, upgrade tier
  2. If prompt corrupted: Revert to last known good prompt
  3. If context overflow: Implement conversation summarization or truncation
  4. If endpoint down: Failover to backup LLM provider
  5. If tools failing: Check external API dependencies, increase timeouts

When to Escalate: If LLM responses look correct but users don't hear them, escalate to Stack 4 (Output).

How Do You Diagnose Stack 4: Output Failures?

Symptoms:

  • Agent silent (no audio output)
  • Garbled or robotic speech
  • Audio cuts off mid-sentence
  • TTS timeout errors
  • Unnatural prosody or pacing

What Causes TTS Failures?

CauseLikelihoodHow to Diagnose
TTS service rate limitedHighCheck for 429 errors
Audio encoding mismatchMediumVerify output format matches telephony (PCMU/Opus)
Voice ID invalidMediumConfirm voice ID exists and is accessible
TTS queue backed upLowCheck queue depth in TTS service
Text too longLowCheck character limits for TTS API

Stack 4 Key Metrics

MetricNormalWarningCritical
TTS Latency (TTFB)<200ms200-400ms>400ms
TTS Error Rate<0.1%0.1-1%>1%
Audio Generation Success>99.9%99-99.9%<99%

Stack 4 Diagnostic Checklist

  • TTS Service Responding?

    • Direct API test to TTS provider
    • Check provider status page
  • Voice ID Valid?

    • Confirm voice exists in provider dashboard
    • Check voice wasn't deleted or renamed
  • Audio Format Correct?

    • Output should match telephony expectations
    • Common formats: PCM 16-bit 16kHz, Opus
  • Rate Limiting?

    • Check for 429 errors in TTS logs
    • Review characters per minute usage

Stack 4 Resolution Steps

  1. If rate limited: Reduce request rate, implement caching for common phrases
  2. If encoding mismatch: Reconfigure output format to match telephony
  3. If voice ID invalid: Switch to backup voice ID
  4. If TTS down: Failover to backup TTS provider

How Do You Handle Cross-Stack Failures?

Sometimes failures span multiple stacks or cascade across them. Signs of cross-stack issues:

  • Symptoms change during the call: Started with audio issues, now LLM is slow
  • Intermittent failures: Works sometimes, fails other times
  • Multiple error types in logs: SIP errors + ASR errors + LLM timeouts

Cross-Stack Diagnostic Approach

  1. Identify the timeline: When did each symptom start?
  2. Find the root cause: Which stack failed first?
  3. Trace the cascade: How did failure in Stack N affect Stack N+1?

Common cascade patterns:

Initial FailureCascade Effect
Network latency (Stack 1)ASR timeouts (Stack 2) → LLM timeouts (Stack 3)
ASR returning garbage (Stack 2)LLM hallucinating (Stack 3)
LLM slow (Stack 3)Turn-taking feels broken, user frustration
TTS slow (Stack 4)User thinks agent died, hangs up

What Are the Key Thresholds for Incident Detection?

MetricNormalWarningCritical
Call Success Rate>95%85-95%<85%
P95 End-to-End Latency<800ms800-1500ms>1500ms
ASR Word Error Rate<5%5-10%>10%
Task Completion Rate>85%70-85%<70%
TTS Timeout Rate<1%1-5%>5%

What Is the Post-Incident Analysis Template?

After resolving an incident, document what happened to prevent recurrence:

Incident Summary Template

## Incident Summary: [TITLE]

**Date/Time:** YYYY-MM-DD HH:MM - HH:MM (duration)
**Severity:** SEV-1/2/3/4
**Impact:** X calls affected, Y% degradation

### Timeline
- HH:MM - Incident detected (how?)
- HH:MM - On-call paged
- HH:MM - Root cause identified (which stack?)
- HH:MM - Mitigation applied
- HH:MM - Full resolution confirmed

### Root Cause
[Which stack failed? What specifically broke?]

### Resolution
[What fixed it?]

### Action Items
- [ ] Preventive measure 1
- [ ] Monitoring improvement
- [ ] Documentation update

### Lessons Learned
[What would have caught this faster?]

Mean Time to Resolution (MTTR) Benchmarks

StackWithout FrameworkWith FrameworkImprovement
Telephony45 min8 min5.6x faster
Audio (ASR/VAD)60 min12 min5x faster
Intelligence (LLM)90 min15 min6x faster
Output (TTS)30 min10 min3x faster

Source: MTTR data from Hamming's incident response analysis across 50+ production voice agent deployments (2025-2026).

Voice Agent Incident Response Checklist

Use this checklist during any incident:

Immediate (First 5 Minutes):

  • Classify severity (SEV-1/2/3/4)
  • Page appropriate team if SEV-1/2
  • Open incident channel/war room
  • Start decision tree: Can calls connect?

Diagnosis (Next 10-15 Minutes):

  • Work through 4-Stack decision tree
  • Identify which stack is failing
  • Run stack-specific diagnostic checklist
  • Check provider status pages

Mitigation (Next 5-10 Minutes):

  • Apply stack-specific resolution steps
  • Verify fix with test calls
  • Confirm metrics returning to normal
  • Update incident channel

Post-Incident (Within 24 Hours):

  • Complete post-incident analysis template
  • Create action items for prevention
  • Update runbook if new failure mode discovered
  • Share learnings with team

Limitations of This Runbook

Not all incidents fit cleanly into one stack. Some failures cascade across multiple components. The framework helps narrow the search, but complex incidents may require investigating multiple stacks simultaneously.

Assumes basic observability is in place. If you don't have logging or metrics, your first step is instrumenting the system—not this runbook.

Generic by necessity. Your specific voice agent stack (Retell, VAPI, custom LiveKit, etc.) will have platform-specific failure modes. This framework provides the mental model; adapt the diagnostics to your stack.

How Does Hamming Help With Incident Response?

Hamming provides the observability layer that makes incident response faster:

  • 4-Stack Visibility: Unified dashboards showing health across Telephony, Audio, Intelligence, and Output
  • Instant Root Cause: One-click from alert to transcript, audio, and model logs
  • 24/7 Synthetic Calls: Catch outages before customers with continuous testing
  • Automated Alerting: Configurable thresholds with Slack, PagerDuty, and webhook integrations
  • Post-Incident Tracing: Full call traces for post-mortem analysis

Instead of scrambling through multiple dashboards during an incident, your team gets a single source of truth with the context needed to resolve issues fast.

Start monitoring your voice agents →

Frequently Asked Questions

Use Hamming's 4-Stack Incident Response Framework: check Telephony (calls connecting?), Audio (sound both ways?), Intelligence (correct responses?), Output (agent speaking?). Work stack-by-stack from bottom up. According to Hamming's incident data, 60% of failures are in Stacks 1-2 (Telephony/Audio), so don't jump to LLM debugging first. Target resolution: Stack 1 <5 min, Stack 2 <10 min, Stack 3 <15 min, Stack 4 <10 min.

ASR failures have four primary causes: (1) codec mismatch between telephony and ASR service—check negotiated codec (PCMU, PCMA, Opus), (2) WebRTC ICE negotiation failure—verify STUN/TURN servers reachable, (3) VAD threshold too aggressive cutting off speech—increase detection threshold, (4) ASR provider degradation—check provider status page. Key metrics: ASR latency <300ms, transcription confidence >0.85, audio packet loss <1%.

Agent not responding typically indicates Stack 2 (Audio) or Stack 3 (Intelligence) failure. First check if audio is reaching the agent by looking for transcripts in logs. If transcripts exist but no response, check LLM logs for timeouts, 429 rate limit errors, or tool call failures. If no transcripts exist, focus on ASR/VAD configuration—VAD may be cutting off speech, or codec mismatch preventing audio processing. According to Hamming data, 50% of 'not responding' issues are audio-layer problems.

Dead air (silence >2 seconds) has four causes: (1) ASR processing delay—check STT latency, target <300ms, (2) LLM response time—check time-to-first-token, target <500ms, (3) TTS synthesis delay—check TTS latency, target <200ms, (4) turn detection issue with endpointing too late. Check each component's latency independently. Total end-to-end latency should be <800ms P95. If one component is slow, it cascades through the entire pipeline.

Test ASR by: (1) checking transcription output in logs for recent calls, (2) verifying audio is reaching ASR service by looking for audio events/frames, (3) testing ASR endpoint directly with a curl command using known audio samples, (4) checking Word Error Rate (WER) if available. Formula: WER = (Substitutions + Deletions + Insertions) / Total Words × 100. Target WER: <5% for clean audio, <10% for noisy conditions. Confidence score should be >0.85.

Mid-call drops indicate: (1) WebRTC ICE failure—check TURN server availability and NAT traversal, (2) SIP session timeout—verify keepalive settings are configured, (3) resource exhaustion—check memory, CPU, connection pool limits, (4) rate limiting—check for 429 errors in ASR, LLM, or TTS services. Log connection state changes to identify the pattern. Track the exact timestamp of drops and correlate with component logs to find the failing stack.

Reduce MTTR by: (1) using a systematic framework like Hamming's 4-Stack approach instead of random debugging, (2) setting up real-time alerting on key metrics (call success rate, P95 latency, ASR error rate), (3) creating stack-specific runbooks for common failure modes, (4) automating diagnostic checks with synthetic calls. Teams using structured incident response resolve issues 4-6x faster according to Hamming's data. Pre-populate your incident channel with quick diagnostic commands.

Troubleshooting is diagnostic (understanding why something failed in depth), while incident response is operational (restoring service as quickly as possible). During an active incident, prioritize mitigation over root cause analysis—restart services, failover to backup providers, scale resources. Do thorough root cause analysis after service is restored. Incident response targets: SEV-1 <15 min to mitigate, SEV-2 <30 min. Troubleshooting has no time pressure.

Measure LLM latency separately from other components. Check: (1) time from request to first token (TTFT)—target <500ms, (2) total response time—target <1000ms for short responses, (3) 429 rate limit errors in logs. If LLM latency is high but endpoint is healthy, check prompt length (context window may be filling up), or consider caching frequent responses. Test LLM directly with curl to isolate the issue from other pipeline components.

Monitor these key metrics with alerting thresholds: (1) Call success rate—warning at <95%, critical at <85%, (2) P95 end-to-end latency—warning at >1000ms, critical at >1500ms, (3) ASR error rate/WER—warning at >5%, critical at >10%, (4) TTS timeout rate—warning at >2%, critical at >5%, (5) LLM error rate—warning at >1%, critical at >5%, (6) Task completion rate—warning at <85%, critical at <70%. Hamming provides real-time dashboards with automatic anomaly detection across all four stacks.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”