Hamming Research

Our Research Methodology

Hamming pioneered voice AI QA. Every framework, benchmark, and recommendation we publish is grounded in production data from real voice agent deployments.

1M+
Voice calls analyzed
50+
Deployments
10+
Platforms tested
6+
Industries served

Hamming works with

LiveKit
Vapi
Retell AI
Pipecat
OpenAI
Synthflow
Daily
11 Labs
LiveKit
Vapi
Retell AI
Pipecat
OpenAI
Synthflow
Daily
11 Labs
LiveKit
Vapi
Retell AI
Pipecat
OpenAI
Synthflow
Daily
11 Labs

How We Conduct Research

Our research methodology combines automated analysis with expert validation.

Step 1

Data Collection

We collect data from production voice agent deployments and synthetic test calls designed to stress-test edge cases.

  • Production call recordings from enterprise customers (anonymized)
  • Synthetic test calls simulating diverse user behaviors
  • A/B comparison data across voice platforms
  • Multi-language and multi-accent test scenarios
Step 2

Analysis Approach

We combine automated LLM-as-judge scoring with manual expert review for nuanced failure analysis.

  • Automated scoring for consistency at scale
  • Manual expert review for edge cases and nuanced failures
  • Statistical validation across deployment segments
  • Regression detection between model versions
Step 3

Quality Standards

All findings are validated across multiple deployments before publication.

  • Findings validated across 3+ enterprise deployments
  • Data anonymized and aggregated for privacy
  • Regular methodology review and iteration
  • Transparent disclosure of limitations

Hamming's Voice Agent Performance Benchmarks

These benchmarks are derived from our analysis of 1M+ production voice agent calls across 50+ deployments (2024-2025).

Latency Benchmarks

MetricExcellentGoodAcceptable
Time to First Word (TTFW)<300ms<500ms<800ms
P50 Turn Latency<1600ms<1800ms<2000ms
P90 Turn Latency<2200ms<2500ms<3000ms
P99 Turn Latency<3000ms<3500ms<4000ms

Accuracy Benchmarks

MetricExcellentGoodAcceptable
ASR Word Error Rate<5%<8%<12%
Goal Completion Rate>90%>80%>70%
Prompt Adherence>98%>95%>90%

Reliability Benchmarks

MetricExcellentGoodAcceptable
Call Success Rate>95%>90%>85%
Escalation Rate<5%<10%<15%
Hallucination Rate<5%<10%<15%

Source: Hamming's analysis of 1M+ voice agent calls across 50+ deployments (2024-2025).

Expert-Verified Research

All research is conducted and reviewed by voice AI QA experts with hands-on experience breaking voice agents across healthcare, financial services, e-commerce, and more.

Questions about our methodology? Contact our research team

See Our Research in Action

Explore our guides, frameworks, and benchmarks built on this methodology.