Voice AI Glossary

Word Error Rate (WER)

A metric measuring how accurately a voice agent's speech recognition system transcribes spoken words, calculated as errors divided by total words.

2 min read
Updated September 24, 2025
Jump to Section

Overview

A metric measuring how accurately a voice agent's speech recognition system transcribes spoken words, calculated as errors divided by total words. This metric is measured in milliseconds and directly correlates with user satisfaction scores. Industry benchmarks suggest keeping Word Error Rate (WER) under specific thresholds for optimal caller experience.

Use Case: If your voice agent consistently misunderstands callers or transcribes names and numbers incorrectly, check WER metrics.

Why It Matters

If your voice agent consistently misunderstands callers or transcribes names and numbers incorrectly, check WER metrics. Optimizing Word Error Rate (WER) directly impacts caller experience, system performance, and operational costs. Even small improvements can significantly enhance user satisfaction.

How It Works

Word Error Rate (WER) is calculated by measuring the time between specific events in the voice agent pipeline. The measurement starts when the triggering event occurs and ends when the measured outcome is achieved. Platforms like Deepgram, AssemblyAI, Twilio each implement Word Error Rate (WER) with different approaches and optimizations.

Common Issues & Challenges

Organizations implementing Word Error Rate (WER) frequently encounter challenges with measurement accuracy, inconsistent performance across different network conditions, and difficulty achieving target benchmarks. High Word Error Rate (WER) often results from inadequate infrastructure, unoptimized models, or poor network connectivity. Automated testing and monitoring can help identify these issues before they impact production callers.

Implementation Guide

Hamming AI recommends testing WER with 50+ recorded utterances covering diverse accents and background noise conditions. Their platform automatically calculates WER during testing to ensure transcription accuracy meets production standards. Regular WER monitoring helps identify degradation in STT performance before it impacts users.

Frequently Asked Questions