Voice Agent Transcript Search Schema for QA Teams

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 4M+ voice agent calls to find where they break.

June 6, 2026Updated June 6, 202613 min read
Voice Agent Transcript Search Schema for QA Teams

Voice agent transcript search is the ability to find the specific calls and turns that matter across transcripts, audio replay pointers, prompt versions, traces, QA labels, redaction state, and evaluation results.

Most teams do not fail because they forgot to store transcripts. They fail because the transcript is a blob. It cannot answer "show me Spanish calls where the billing agent missed identity verification after prompt v42," or "find every failed tool call where the user said cancel and the audio segment is reviewable."

If your team reviews a dozen calls manually each week, a provider dashboard may be enough. This schema is for teams operating multiple agents, queues, languages, prompt releases, or compliance workflows where transcript search becomes the front door for QA and incident review.

We found the recurring failure was review handoff: teams could find the call, but not the turn, replay offset, redaction state, or evaluation that made the call actionable.

TL;DR: Build transcript search as five linked records: call, turn, artifact, label, and evaluation.

Make redacted turn text searchable. Keep raw audio, unredacted transcript, tool payloads, and full traces as controlled pointers. Every query result should tell a reviewer what matched, where to replay it, whether it was redacted, which prompt version ran, and what action to take next.

Transcript search schema: a voice-agent data model that lets teams query call turns by text, time, speaker, language, intent, sentiment, prompt version, redaction state, trace ID, audio offset, QA label, and evaluation result without granting broad access to raw recordings.

Methodology Note: This schema is based on Hamming's analysis of 4M+ production voice agent calls and QA review workflows across 10K+ voice agents (2025-2026).

Treat it as an implementation contract, not legal advice. Regulated archives, recording consent, and deletion windows still need counsel-approved policy.

Last Updated: June 2026

Related Guides:

A transcript blob is easy to store and hard to operate.

Blob ShortcutWhat Breaks Later
One JSON document per call with the full transcriptSearch can find a call, but not the specific turn, timestamp, or replay segment.
Raw transcript text in the analytics tableBroad search becomes a privacy problem.
Audio URL stored beside the transcript onlyReviewers cannot jump from a matching sentence to the right audio offset.
Prompt version stored in a deployment table onlyTeams cannot compare failures across prompt releases.
QA labels stored as commentsLabels cannot drive dashboards, regression tests, or routing.
Provider call ID as the only keyTransfers, LiveKit rooms, recordings, traces, and evaluations drift apart.

Amazon Connect documentation recommends using contact ID to find the correct recording because recording filenames may not reliably match the contact ID. That is the right instinct: search should depend on stable identity, not file names or dashboard URLs.

The same pattern applies to voice agents. Use one canonical call ID, store provider aliases under it, and make each transcript turn addressable.

The Five Records to Model

Do not start with the search engine. Start with the records reviewers need.

RecordGrainSearchable by Default?Purpose
callone voice interactionYes, metadata onlyOwns canonical identity, agent, queue, language, outcome, and retention class
turnone speaker segmentYes, redacted textOwns transcript text, speaker, timestamps, language, sentiment, confidence, and replay offset
artifactone external pointerNo raw contentPoints to audio, transcript JSON, trace, tool evidence, redaction report, or provider record
labelone human or model tagYesStores intent, issue category, compliance tag, root cause, reviewer decision, and confidence
evaluationone score or assertion resultYes, summary onlyStores rubric, evaluator version, pass/fail, score, failure reason, and suggested next action

This split keeps the common path fast and safe. QA can search redacted turns and labels. Engineering can jump from a result to a trace or tool summary. Compliance can restrict raw artifacts without breaking every dashboard.

Search rule: broad transcript search should return redacted text, call identity, turn offsets, labels, scores, and replay pointers. It should not return raw audio, unredacted transcript text, secrets, or full backend payloads by default.

Required Fields

Use this as the minimum warehouse or index contract.

FieldRecordTypeWhy It Matters
canonicalCallIdcall, turn, artifact, label, evaluationstringJoins every record to one call story
providerAliasescallobjectStores CallSid, contact ID, room name, provider call ID, recording ID, and trace ID
agentIdcallstringFilters by deployed voice agent
agentVersioncall, turnstringCompares behavior across prompt, model, workflow, or tool changes
environmentcallenumSeparates production, staging, sandbox, and test calls
startedAtcalltimestampDrives retention, cohorting, and incident windows
routeOrQueuecallstringGroups calls by business flow
languagecall, turnBCP 47 stringEnables multilingual transcript search
speakerturnenumSeparates caller, agent, IVR, human agent, and system output
turnStartMs / turnEndMsturnintegerLets reviewers jump to the matching audio segment
redactedTextturntextDefault searchable transcript field
redactionStatecall, turn, artifactenumPrevents accidental raw-data exposure
intentLabellabelstringFinds missed intents and flow gaps
sentimentLabellabelenumSupports QA triage and trend queries
traceId / spanIdcall, turn, artifactstringLinks search results to observability data
audioArtifactIdturn, artifactstringPoints to controlled audio replay
rubricId / scoreevaluationstring / numberMakes QA and model-judge results queryable
reviewStatusevaluationenumTracks whether a call needs action
retentionClasscall, artifactstringConnects search to deletion and legal-hold policy

Google's call session metadata docs show why call-level state, recording URLs, permissions, queues, and virtual-agent fields belong in structured metadata. Azure Communication Services diagnostics logs use correlation IDs, participant IDs, endpoint IDs, media type, stream IDs, jitter, packet loss, and related media fields. Voice-agent search needs those same join and quality concepts, plus transcript-turn, prompt-version, and QA fields.

A Copyable JSON Shape

Here is a compact version. In production, you can store these as relational tables, search documents, lakehouse tables, or a hybrid index. The shape matters more than the storage product.

{
  "call": {
    "canonicalCallId": "call_2026_06_06_1842",
    "startedAt": "2026-06-06T18:42:19.000Z",
    "environment": "production",
    "agentId": "billing-agent",
    "agentVersion": "billing-agent@2026-06-06.3",
    "routeOrQueue": "billing-support",
    "language": "en-US",
    "providerAliases": {
      "telephonyCallId": "CA...",
      "contactCenterContactId": "contact-...",
      "livekitRoom": "billing-prod-1842",
      "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
    },
    "retentionClass": "support-investigation",
    "redactionState": "redacted"
  },
  "turn": {
    "turnId": "turn_0007",
    "canonicalCallId": "call_2026_06_06_1842",
    "speaker": "caller",
    "turnStartMs": 42840,
    "turnEndMs": 49120,
    "language": "en-US",
    "redactedText": "I already verified my account. Why are you asking again?",
    "asrConfidence": 0.86,
    "redactionState": "redacted",
    "traceId": "4bf92f3577b34da6a3ce929d0e0e4736",
    "spanId": "span_asr_turn_0007",
    "audioArtifactId": "audio_1842_redacted"
  },
  "labels": [
    {
      "labelType": "intent",
      "labelValue": "identity_verification_confusion",
      "source": "reviewer",
      "confidence": 1.0
    },
    {
      "labelType": "sentiment",
      "labelValue": "negative",
      "source": "model",
      "confidence": 0.81
    }
  ],
  "evaluation": {
    "rubricId": "billing_identity_v4",
    "evaluatorVersion": "2026-06-01",
    "score": 0.42,
    "passed": false,
    "failureReason": "agent_repeated_identity_check",
    "reviewStatus": "promote_to_regression"
  }
}

The field names can change. The invariant should not: search results must join text, replay, trace, labels, scores, redaction, and version context without a human guessing which dashboard owns the truth.

Query Cookbook

Design the search product around real review questions.

QuestionRequired FiltersResult Should Show
Which calls mentioned cancellation and ended with negative sentiment?transcript text, sentiment label, outcomematching turns, call outcome, replay offset, reviewer status
Did prompt v42 increase identity-verification confusion?agent version, label, date rangecount by version, sample calls, failing turns
Which Spanish calls had low ASR confidence and a failed task?language, ASR confidence, evaluation resultturns, audio pointers, failure reasons
Which raw artifacts are not redacted yet?artifact type, redaction stateartifact owner, retention class, blocked downstream views
Which tool failures are visible in the transcript?tool failure label, transcript phrase, trace IDcall, turn, trace, tool summary
Which calls should become regression tests?review status, failure label, evaluation scoreevidence packet, owner, suggested test name
Which searches require restricted access?raw artifact request, unredacted field, user roledenial reason or approval workflow

Amazon Connect Contact Lens search supports search across analyzed conversations by transcript words, sentiment, non-talk time, and categories. That is a useful reference shape, but voice-agent teams usually need additional dimensions: prompt version, tool behavior, trace ID, evaluator version, and regression-promotion status.

Search Index Architecture

There are two common approaches.

ApproachUse WhenWatch Out For
One document per call with nested turnsCall volume is moderate and each call has bounded turnsNested arrays can become expensive, and highlighting the matching turn may require special query handling
One document per turn plus call-level joinsCall volume is high or reviewers need precise turn searchYou need reliable joins back to call metadata, artifacts, labels, and evaluations

For most production voice-agent QA workflows, one searchable record per turn is the cleaner default. Keep call metadata denormalized into the turn record where it is safe and stable: agent, version, language, route, date, and redaction state. Keep raw artifacts as IDs or URLs behind access checks.

OpenTelemetry's logs data model includes timestamp, trace ID, span ID, severity, body, resource, attributes, and event name. Use that as a mental model for events and logs, not as a reason to stuff full transcripts into ordinary application logs.

OpenSearch mappings recommend explicit mappings when consistency and performance matter. For transcript search, map structured filters such as agentId, agentVersion, language, speaker, redactionState, routeOrQueue, and reviewStatus as keyword-like fields, while redactedText should be a full-text field. OpenSearch's text field docs describe how analyzed text supports full-text search and highlighting. Its text chunking docs are useful when you add semantic search over longer passages, summaries, or call-level narratives.

Redaction and Access Boundaries

Transcript search is useful only if teams trust it.

DataDefault Search AccessRestricted Access
Redacted turn textQA, support, product, engineeringUsually not needed
Unredacted turn textNo broad searchcompliance-approved users
Raw audioPointer onlyplayback by approved roles
Redacted audioPointer or clipped replayQA and engineering, depending on policy
Tool payloadssummary onlyengineering or incident owner
Prompt/system instructionsversion pointer onlyowner-approved debugging
Trace datatrace ID and key spansengineering detail view
Aggregate labelsbroad dashboardsrow-level drilldown by role

Amazon Connect recording docs describe where recordings and transcripts are stored and how contact IDs help locate the right recording. Amazon Transcribe post-call analytics can produce redacted and unredacted transcript/audio outputs depending on settings. The product lesson is simple: keep raw and redacted artifacts separate, and make the default search surface the safer copy.

Tie this to your log retention checklist. A search index is not a legal archive. It is a review surface that should respect retention class, redaction state, deletion status, and legal hold.

What to Validate Before Launch

Run these checks before you rely on transcript search for QA or incidents.

GatePass ConditionBlock When
IdentityEvery turn joins to one canonical call IDprovider IDs conflict or calls split across records
ReplayMatching turns have audio offsets and controlled artifact pointersreviewer cannot jump from text to audio
RedactionBroad search indexes only approved redacted textunredacted text appears in general results
VersioningCalls and turns include agent or prompt versionfailures cannot be compared across releases
LabelsIntent, sentiment, issue category, and reviewer labels have source and confidencelabels are comments or untyped strings only
EvaluationScores include rubric ID, evaluator version, and failure reasonscore cannot be interpreted
AccessUser role controls raw playback, export, and unredacted fieldssearch bypasses storage permissions
DeletionRemoved calls disappear from search and derived indexesdeleted source remains searchable
ObservabilitySearch result links to trace ID or evidence packetengineering cannot debug the matched call

This is where transcript search connects to the call evidence export runbook. Search finds the calls and turns. Evidence packets preserve the bounded artifacts a reviewer needs. The failed production call regression runbook turns the right failures into durable tests.

How Hamming Fits

Hamming helps teams search, score, review, and act on production voice-agent calls without treating transcripts as isolated text files. The important thing is not just finding a phrase. It is knowing what happened around that phrase: which prompt version ran, what the caller heard, what the model decided, which tool call fired, how the evaluator scored it, and whether the result should become a regression test.

Use Hamming when you need to:

  • Search production calls by transcript, label, score, prompt version, language, and failure reason.
  • Replay the relevant audio segment next to the redacted transcript and evaluation result.
  • Connect call search to traces, tool evidence, QA findings, and incident workflows.
  • Promote recurring failures into response coverage improvements and regression tests.
  • Keep analytics dashboards pointed at evidence instead of turning dashboards into raw transcript archives.

The operating loop is straightforward: search the turns, replay the evidence, label the issue, fix the behavior, and preserve the pattern if it should never happen again.

Launch Checklist

Before shipping transcript search, make sure:

  • Calls, turns, artifacts, labels, and evaluations are separate records or separate logical views.
  • Every record joins through one canonical call ID.
  • Broad search indexes redacted turn text, not raw transcript text.
  • Audio replay uses controlled pointers with turn-level offsets.
  • Prompt, model, workflow, or agent version is stored on every call.
  • Intent, sentiment, issue category, and reviewer labels are typed and source-attributed.
  • Evaluation rows include rubric ID, evaluator version, score, pass/fail, and failure reason.
  • Search results show redaction state, retention class, and access limitations.
  • Deleted or legally restricted calls are removed from searchable surfaces.
  • Search results link to trace IDs, evidence packets, or incident runbooks.

If you cannot check those boxes, do not scale the search surface yet. You may have a transcript archive, but you do not have a QA-ready transcript search system.

Frequently Asked Questions

A voice agent transcript search schema is the data model that makes production calls searchable by call, turn, speaker, timestamp, language, prompt version, redaction state, QA label, and evaluation result. Hamming recommends splitting calls, turns, artifacts, labels, and evaluations so search results can point to the specific transcript segment and replay evidence.

One JSON blob per call is easy to store but weak for QA search because it hides the matching turn, replay offset, redaction state, and prompt version. Hamming's schema keeps turn-level records searchable while raw transcript and audio artifacts stay behind stricter access controls.

Every searchable turn should include canonical call ID, turn ID, speaker, start and end offsets, language, redacted text, redaction state, trace ID, audio artifact pointer, and agent version. Hamming also recommends attaching intent, sentiment, issue category, reviewer status, and evaluation result through typed label and evaluation records.

Index redacted transcript text for broad search and keep unredacted text, raw audio, tool payloads, and full traces as restricted artifacts. Hamming recommends storing redaction state and retention class on each call, turn, and artifact so the search layer can enforce access boundaries.

Search results should include the matching turn's audio artifact ID and start/end offsets so reviewers can jump directly from transcript text to the relevant recording segment. According to Hamming's call review pattern, this is what turns search from a text archive into useful QA evidence.

Store agent, prompt, workflow, model, or tool version on every call and denormalize the stable version fields into searchable turn records. Hamming recommends querying labels and evaluation failures by version so teams can see whether a release improved or worsened specific failure modes.

Transcript search finds the call turns and failure patterns that need review. A call evidence packet packages the selected transcript, audio, trace, tool evidence, evaluation result, and manifest so QA, engineering, or compliance reviewers can inspect one bounded call safely.

Validate that every searchable turn joins to a canonical call ID, points to replay evidence, uses redacted text by default, includes agent version, carries typed labels, and respects deletion and access controls. Hamming recommends blocking launch if broad search can expose raw audio, unredacted transcript text, or orphaned records.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizento build the future of trustworthy, safe and reliable voice AI agents.”