Hamming AI Launches Advanced Call Analytics for Voice Agent Testing

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

December 4, 20243 min read
Hamming AI Launches Advanced Call Analytics for Voice Agent Testing

Call Analytics for AI Voice Agent Testing

We built analytics because teams kept asking us the same question: "Something changed after our last deploy, but we can't figure out what."

They'd send screenshots of transcripts, hoping we could spot the problem. Sometimes latency had crept up. Sometimes completion rates had dropped. They knew something was wrong—they just couldn't see it in the data they had.

Now you can see it directly. Our new analytics module visualizes performance metrics during automated voice agent testing, so you can answer "what changed?" without digging through logs.

Quick filter: If you can’t answer “what changed after the last release?” from your dashboard, you don’t have call analytics yet.

Comprehensive Performance Metrics during Voice Agent Testing

Track critical metrics like:

  • p50 latency measurements (median response time - half of all responses are faster than this)
  • p90 latency measurements (90th percentile response time - 90% of responses are faster than this)
  • Call duration (average, min, max)
  • Scoring metrics (accuracy, completeness, etc.)
MetricQuestion it answersWhy it matters
Latency p50What does a typical user experience?Reveals baseline responsiveness
Latency p90How bad is the worst 10%?Exposes slow calls that hurt CSAT
Call durationAre calls too long or too short?Detects stalls, loops, or drop-offs
Scoring metricsDid the agent complete the task?Connects testing to outcomes

Why this matters

Understanding response times and performance metrics is crucial for delivering exceptional customer experiences with AI voice agents. Slow or inconsistent responses can frustrate users and damage brand reputation. Our analytics module enables you to:

  • Optimize User Experience: Monitor and improve response times to ensure natural, fluid conversations
  • Reduce Operational Costs: Identify and fix inefficiencies before they impact your bottom line
  • Ensure Reliability: Track system stability and catch potential issues early
  • Drive Continuous Improvement: Make data-driven decisions to enhance voice agent performance

With comprehensive analytics, you can confidently scale your voice AI operations while maintaining high quality standards. Teams can quickly identify areas for optimization and measure the impact of improvements over time.

We built this after too many teams sent us screenshots of raw transcripts asking, “Can you tell what changed?” Now the dashboard shows it directly.

Getting Started

To begin using the new analytics module:

  1. Go to Voice Agents and select a voice agent
  2. Navigate to the 'Trends' section
  3. Start tracking performance

Looking Forward

This analytics module represents our commitment to providing comprehensive tools for voice AI testing and optimization. We're grateful to Jordan Farnworth from Podium for helping us improve our analytics capabilities.

More to come!

Frequently Asked Questions

Call analytics turn test calls (and optionally production calls) into actionable signals: where conversations slow down, where users interrupt, where flows break, and which changes introduced regressions. Instead of reading transcripts one by one, you can spot patterns across thousands of calls.

Turn-level latency percentiles (especially time-to-first-word), interruption rate, silence gaps, fallback/clarification rate, transfer rate, and flow drop-off points. Pair these with per-turn evaluations and error tagging so you can quickly tell whether a failure was ASR-related, reasoning-related, or caused by a downstream tool/API.

Hamming aggregates call results into dashboards that highlight outliers and recurring failure modes, then links them back to replayable traces. That makes it easy to compare prompt/model versions, verify a fix, and prevent the same issue from resurfacing in later releases.

Use analytics to run a tight loop: canary a change, compare metrics by version (completion, transfer, latency p90/p99), and investigate the few calls that explain the shift. Once you fix a bug, convert the failing calls into regression tests so the suite gets stronger over time. If you can’t answer “what changed after the last release?” your analytics aren’t wired right.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”