What security questions should I ask a voice agent testing vendor?

Ask where audio, transcripts, metadata, tool traces, and QA notes are stored; who can access them; how long they are retained; and whether customer data is used for model training. Hamming recommends asking for at least 5 evidence types before a POC: SOC 2 scope, subprocessor list, retention policy, access-control model, and incident-response process.

Is SOC 2 enough for a voice agent QA vendor?

SOC 2 is important, but it is not enough by itself because voice agent QA also touches AI behavior, transcript privacy, tool-call actions, and production monitoring. Hamming recommends treating SOC 2 as one control family and adding voice-specific questions about redaction, prompt injection, tool permissions, retention, and deployment model.

What evidence should a voice agent monitoring vendor provide before a POC?

A vendor should provide a security overview, SOC 2 or equivalent control evidence when available, data-flow diagram, subprocessor list, retention and deletion policy, and role-based access model. Hamming recommends reviewing those artifacts before any POC uses real production calls or sensitive transcripts.

How do I evaluate transcript access controls for a voice agent vendor?

Ask whether raw audio, unredacted transcripts, redacted transcripts, QA notes, exports, and admin actions have separate permissions and audit logs. Hamming recommends failing the security review if every reviewer, operator, or support user receives broad access to the same sensitive call evidence.

What voice-specific security risks do generic SaaS questionnaires miss?

Generic SaaS questionnaires often miss caller ID spoofing, recording access, transcript redaction, prompt injection over speech, tool abuse, DTMF/payment handling, and handoff context leakage. Hamming recommends adding a voice-specific section with at least 10 questions that cover telephony, AI behavior, and post-call evidence.

When should I require a private tenant or self-hosted voice agent monitoring deployment?

Consider private tenant or self-hosted deployment when calls contain regulated data, customer contracts restrict shared infrastructure, regional residency is mandatory, or internal policy requires customer-managed storage and keys. Hamming recommends documenting the reason before procurement so the deployment model matches risk rather than preference.

How should HIPAA change a voice agent vendor security review?

For HIPAA workloads, ask whether the vendor will sign a BAA, how electronic PHI is safeguarded, how access is audited, and how deletion or retention policies apply to recordings and transcripts. Hamming recommends checking administrative, physical, and technical safeguards before testing with real patient calls.

How do I test whether a vendor can limit access to sensitive voice agent calls?

Run a non-production POC with seeded sensitive fields, then verify redaction, role-based access, export restrictions, audit logs, and deletion behavior. Hamming recommends making this a go/no-go gate because sensitive transcript access is easier to prevent before launch than to unwind after production data spreads.

Voice Agent Security Review Questions for Testing and Monitoring Vendors

Most voice agent vendor reviews ask the usual SaaS questions: SOC 2, SSO, encryption, subprocessors, uptime, and incident response.

Those questions matter. They just do not cover enough.

A voice agent testing or monitoring vendor may handle raw call audio, transcripts, redacted transcripts, PII, PHI, prompt versions, tool traces, QA annotations, escalation notes, and production failure samples. The security review has to prove how that evidence moves, who can see it, how long it lives, and whether the AI layer can be abused.

Voice agent security review questions are the vendor due-diligence questions that cover both normal SaaS controls and voice-specific risks: recordings, transcripts, redaction, telephony metadata, prompt injection, tool actions, data residency, retention, deletion, and production monitoring evidence.

Quick filter: This checklist is for security, procurement, engineering, and QA teams evaluating a voice agent testing, QA, or monitoring vendor. If the POC will only use synthetic calls with no customer data, use the shorter version. If the POC will touch production recordings, transcripts, or regulated workflows, use the full checklist before launch.

TL;DR: Ask 30 questions before a voice agent testing POC:

What data enters the vendor: audio, transcript, metadata, tool traces, QA notes, or exports?

Is customer data used for model training, fine-tuning, evaluation, or support debugging?

Who can access raw audio and unredacted transcript text?

Can the vendor separate raw, redacted, aggregate, and exported evidence?

How are retention, deletion, legal hold, and audit logs enforced?

Which model, telephony, storage, and analytics subprocessors touch the data?

How does the platform test prompt injection, sensitive-data leakage, and unsafe tool actions?

What evidence can the vendor show before a POC uses sensitive calls?

Methodology Note: This checklist is based on Hamming's analysis of production voice agent testing, monitoring, and security-review workflows across 10K+ voice agents (2025-2026). We've tested agents built on LiveKit, Pipecat, ElevenLabs, Retell, Vapi, and custom-built solutions.
It also uses public guidance from AICPA, HHS, OWASP, and NIST so procurement questions stay tied to recognized control and risk-management frameworks.

Last Updated: May 2026

Related Guides:

SOC 2 Voice Agent Testing - control evidence and audit readiness for voice-agent QA
HIPAA PHI Clinical Workflow Testing Checklist - healthcare workflow and PHI testing controls
PII Redaction for Voice Agents - redaction design for transcripts and audio evidence
PII Redaction Compliance Architecture - architecture patterns for sensitive voice data
Voice Agent Log Retention Compliance Checklist - retention, deletion, and audit archive decisions
Call Logging for AI Voice Agents - taxonomy for call logs, transcripts, and metadata
Voice Agent Workflow Testing - tool-call and side-effect test evidence
Voice Agent Production Readiness Checklist - launch gates before real callers use the agent
Voice Agent Monitoring Platform Guide - production monitoring requirements
Voice Agent Incident Response Runbook - evidence needed during customer-impacting incidents

I used to treat this as a security-packet problem: collect the SOC 2 report, confirm SSO, check the subprocessor list, and move on. That misses the part unique to voice agents. The most uncomfortable review questions are usually about the evidence trail: who can play the recording, who can read the raw transcript, what the evaluator stores, and whether a spoken attack can trigger the wrong tool action.

What a Voice Agent Security Review Has to Cover

Start with a map. Do not start by forwarding a generic vendor security questionnaire.

Review Area	What It Covers	Why Voice Agents Are Different
SaaS controls	SOC 2 scope, SSO, RBAC, encryption, vulnerability management, incident response	Necessary baseline, but not enough for call evidence and AI behavior
Data flow	Audio, transcript, metadata, tool traces, QA notes, exports, support access	Spoken conversations can contain account numbers, medical details, consent statements, and payment context
AI behavior	prompt injection, sensitive information disclosure, hallucination, tool abuse, model-provider data use	The caller can attack the system through natural speech, not just a web form
Voice stack	telephony provider, SIP/WebRTC path, recording storage, DTMF handling, transfer/handoff metadata	Call routing and recording systems may sit outside the core application boundary
Production monitoring	failure samples, sampled calls, reviewer access, alerting, incident evidence	Monitoring often expands data access after launch if it is not designed carefully
Deployment model	multi-tenant SaaS, private tenant, customer-owned storage, self-hosted, hybrid	Regulated buyers may need stronger isolation, regional residency, or customer-managed keys

The key mistake is treating the vendor as a normal SaaS tool while ignoring the evidence it will process. A voice agent testing platform may become the place where the most sensitive failures are collected: angry callers, failed identity checks, escalations, medical questions, payment attempts, and agent mistakes.

Working rule: if a vendor will store or inspect real call evidence, the security review must cover every evidence class separately: raw audio, unredacted transcript, redacted transcript, metadata, QA annotation, tool trace, export, and aggregate metric.

AICPA's Trust Services Criteria cover security, availability, processing integrity, confidentiality, and privacy controls for service-organization systems. That is a good baseline. It does not remove the need to ask how the vendor handles voice-specific data and AI-specific failure modes.

The 30-Question Vendor Security Checklist

Use this as the first pass. It is intentionally direct. A mature vendor should be able to answer without inventing policy during the call.

#	Question	Good Evidence
1	Which specific data types enter your system during testing or monitoring?	Data-flow diagram with audio, transcript, metadata, tool traces, QA notes, and exports labeled
2	Is any customer audio or transcript text used for model training, fine-tuning, evaluation, or support debugging?	Written data-use policy and opt-in/opt-out controls
3	Do raw audio and unredacted transcripts have separate permissions?	RBAC matrix and audit-log samples
4	Can redacted transcripts be used for search and analytics without exposing raw text?	Redaction workflow, status field, and separate storage or access boundary
5	Can customers configure retention by data type?	Retention policy by audio, transcript, metadata, QA note, and aggregate metric
6	How do deletion requests propagate across transcripts, recordings, exports, analytics, and backups?	Deletion runbook and completion evidence
7	Which subprocessors touch customer call data?	Current subprocessor list and data categories per subprocessor
8	Which regions store or process data?	Residency options and cross-border transfer explanation
9	What is covered by your SOC 2 report?	Report scope, control families, period, and excluded systems
10	Can you sign a BAA for HIPAA workloads?	BAA terms and ePHI handling boundaries
11	Is SSO/SAML supported and can MFA be enforced?	Identity-provider setup guide and access logs
12	Are admin actions, exports, playback, and transcript views audited?	Audit-log sample with actor, timestamp, action, and object
13	Can support staff access customer calls?	Support-access policy, approval workflow, time limits, and logging
14	How are secrets, API keys, webhook URLs, and telephony credentials stored?	Key-management policy and rotation process
15	How do you secure webhook ingestion and outbound callbacks?	Signature verification, replay protection, and retry policy
16	How do you prevent prompt injection through caller speech or transcript text?	Test suite, policy, and failure samples
17	How do you prevent the agent or evaluator from leaking sensitive information?	Red-team scenarios and sensitive-output checks
18	How do you prevent unauthorized tool actions during test calls?	Tool permissions, sandboxing, idempotency, and side-effect controls
19	Can test calls run without writing to production CRMs, calendars, EHRs, payment systems, or ticketing tools?	Sandbox or mock integration evidence
20	How do you isolate tenant data?	Architecture note, access model, and incident boundary
21	Can we use customer-owned storage or customer-managed keys?	Supported architecture and operational tradeoffs
22	Do you support private tenant or self-hosted deployment?	Deployment model table and shared-responsibility model
23	How are production monitoring samples selected?	Sampling policy and opt-out controls
24	Can customers prevent sensitive queues from being monitored?	Queue-level policy controls
25	Can call evidence be exported securely for auditors?	Scoped export workflow and audit trail
26	What happens to data after contract termination?	Termination deletion or export policy
27	What security incidents would trigger customer notification?	Incident notification policy and SLA
28	How do you validate upstream model-provider changes?	Model-change review and rollback process
29	Can customers pin, restrict, or approve model providers?	Provider configuration and governance controls
30	What are the known unresolved risks?	Honest risk register, roadmap, or compensating controls

The last question matters. Vendors that can name unresolved risks are usually safer than vendors that claim there are none.

Evidence to Request Before a POC

Do not wait until procurement to ask for evidence. The POC is when sensitive data boundaries are most likely to get blurred.

Evidence	Ask For It When	Review Owner
Security overview	Before any account setup	Security
SOC 2 report or bridge letter	Before enterprise contract review	Security/procurement
Data-flow diagram	Before production calls enter the platform	Engineering/security
Subprocessor list	Before any real customer data is processed	Legal/security
Data-use policy	Before transcripts, recordings, or QA notes are uploaded	Legal/security
Retention and deletion policy	Before persistent storage is enabled	Security/compliance
RBAC and audit-log sample	Before reviewers or support users are invited	Security/operations
BAA or regulated-workload terms	Before PHI or healthcare workflows are tested	Legal/compliance
Incident-response process	Before production monitoring is enabled	Security/on-call
AI behavior test summary	Before tool-calling or sensitive flows are tested	Engineering/security

This is not paperwork for its own sake. It prevents the awkward moment where the POC works technically but fails security because the vendor cannot explain where call evidence went.

Voice-Specific Risks Generic SaaS Reviews Miss

A generic questionnaire may ask whether data is encrypted. It usually does not ask whether a caller can trick the agent into reading back private context, whether DTMF digits are isolated from the model, or whether a support user can play raw recordings.

Risk	Why It Matters	Security Review Question
Prompt injection over speech	Attackers can speak instructions that alter model behavior.	How do you test spoken prompt injection and indirect transcript injection?
Sensitive information disclosure	The model may reveal secrets, account context, policy text, or prior-call details.	What checks prevent sensitive output before it reaches the caller or reviewer?
Excessive tool agency	A model can trigger actions beyond what the caller is authorized to request.	What tools can the system call, under what policy, and in which environment?
Transcript overexposure	Reviewers may need QA access without raw PII access.	Can raw and redacted transcript access be separated?
Recording playback	Audio can reveal biometric voice characteristics and sensitive spoken content.	Who can play recordings, export them, or share links?
DTMF and payment leakage	Digits can pass through systems that should never see payment data.	Can payment or DTMF collection be isolated from transcription and LLM context?
Handoff context leakage	Summaries can expose more data than the receiving queue needs.	Can handoff payloads be minimized and audited?
Production monitoring creep	Monitoring can expand from failure review into broad surveillance.	Can customers scope monitoring by queue, policy, and data class?

OWASP's LLM application guidance calls out risks such as prompt injection, sensitive information disclosure, insecure output handling, and excessive agency. For voice agents, those risks show up through speech, transcripts, tool calls, and handoffs.

Data Flow, Transcript Access, and Retention Questions

The data-flow review should be concrete enough that an engineer can draw it and a security reviewer can challenge it.

caller audio  -> telephony / WebRTC / SIP layer  -> recording and streaming transcription  -> voice agent runtime  -> LLM provider and tool calls  -> testing or monitoring platform  -> QA review, alerts, exports, analytics, and archive

Ask these questions for each hop:

Data Class	Storage Question	Access Question	Retention Question
Raw audio	Where is the recording stored and encrypted?	Who can play or export it?	Can it expire faster than redacted evidence?
Unredacted transcript	Is it stored after redaction completes?	Who can view raw text?	Can it be short-lived by default?
Redacted transcript	Is it the default analytics copy?	Can reviewers search without raw PII access?	Can it retain longer than raw content when policy allows?
Metadata	Which IDs, timestamps, queues, versions, and outcomes are stored?	Can low-PII metadata be broadly available?	Can it support debugging after raw content expires?
Tool trace	Are arguments, results, and errors filtered for secrets?	Who can inspect tool payloads?	Can traces expire separately from QA results?
QA annotation	Does it include reviewer notes or customer-sensitive text?	Can customer admins see reviewer activity?	Does it follow contract or audit requirements?
Export	Where do downloads, webhooks, and API exports go?	Can exports be disabled or approved?	Are exported copies included in deletion evidence?

The voice agent log retention checklist goes deeper on retention classes. The call logging taxonomy helps standardize the fields before they spread across vendors.

Deployment Model Decision Table

Do not ask for private tenant or self-hosting because it sounds more secure. Ask because a specific risk requires it.

Deployment Model	Use When	Tradeoff
Multi-tenant SaaS	Low-to-moderate sensitivity, standard enterprise controls, fastest rollout	Depends on vendor tenant isolation and shared infrastructure controls
Private tenant	Regulated workflows, strict customer isolation, enterprise residency requirements	More operational overhead and longer setup
Customer-owned storage	Customer wants lifecycle, retention, keys, or archive controls in its own environment	Vendor may not control every retrieval or deletion workflow
Self-hosted	Contract, regulator, or internal policy requires customer-controlled runtime	Customer owns more upgrades, monitoring, and incident response
Hybrid	Sensitive evidence stays customer-side while aggregate results sync to vendor	More integration work and more shared-responsibility ambiguity

NIST's AI Risk Management Framework emphasizes governance, mapping, measurement, and management of AI risks over the system lifecycle. Deployment model is one of those risk decisions. It should connect to data class, jurisdiction, customer contract, and operational owner.

AI Behavior, Prompt Injection, and Tool-Action Questions

Voice agent security is not only about storing data safely. It is also about preventing the system from doing the wrong thing with a caller, a transcript, or a tool.

Ask vendors to show how they test:

AI Behavior Risk	Test Question	Passing Evidence
Spoken prompt injection	Can a caller override system instructions by speaking them?	Test cases for direct and indirect prompt injection
Sensitive output	Can the agent reveal system prompts, secrets, account details, or previous-call content?	Output checks and red-team samples
Unauthorized action	Can the agent call a tool the caller is not allowed to use?	Permission model and tool-policy guardrails
Tool argument leakage	Can prompts, credentials, or PII leak into tool payloads?	Payload filtering and secret-scanning evidence
Duplicate side effect	Can retries create duplicate bookings, refunds, tickets, or messages?	Idempotency and sandbox test results
Handoff oversharing	Can the agent send too much context to a human queue or downstream system?	Handoff payload minimization and audit logs
Model drift	Can an upstream model change weaken safety behavior?	Model-change evaluation and rollback process

This connects directly to voice agent workflow testing. Tool calls are not just quality events. They are permissioned actions that need preconditions, traceability, and side-effect evidence.

Regulated Deployment Add-Ons

For healthcare, finance, insurance, and BPO deployments, add these questions instead of relying on the baseline checklist.

Environment	Add These Questions
Healthcare	Will the vendor sign a BAA? Which systems handle ePHI? How are administrative, physical, and technical safeguards implemented? Can PHI be redacted before broad QA review?
Financial services	How are regulatory script adherence, call recordings, dispute evidence, access logs, and retention schedules handled? Can sensitive payment or account details be isolated?
Insurance	How are claims details, policy numbers, medical context, and adjuster notes protected? Can claim workflows be tested without writing to production systems?
BPO / outsourced operations	Can data, dashboards, reviewers, exports, and alerts be segmented by client? Are cross-client analytics de-identified and access-controlled?
International contact centers	Which regions process data? Can residency differ by customer, queue, or workspace? How are cross-border transfers documented?

HHS describes the HIPAA Security Rule as requiring appropriate administrative, physical, and technical safeguards for electronic protected health information held by covered entities and business associates. For voice agent vendors, that means the review needs to include call recordings, transcripts, QA workflows, exports, and support access when they may contain PHI.

For more healthcare-specific test design, use the HIPAA PHI clinical workflow testing checklist. For redaction architecture, use the PII redaction compliance guide.

Red Flags and Green Flags

Use this section during vendor calls. It is often faster than a 200-row questionnaire.

Red Flag	Why It Matters
"SOC 2 covers it" is the answer to every AI-specific question	SOC 2 scope may not cover prompt injection, tool abuse, or model-provider behavior
Raw audio and transcript permissions are the same	Reviewers may receive more sensitive data than they need
Support access is broad or informal	Debugging can become a privacy incident
Retention is one global number	Raw audio, redacted transcripts, metadata, and QA notes need different policies
The vendor cannot list subprocessors	Buyers cannot evaluate where sensitive call evidence goes
Production monitoring requires all calls by default	Sampling and queue-level policy should be configurable
The vendor cannot test deletion or export in staging	Compliance behavior is being assumed, not proven
The vendor cannot explain model-provider data use	Customer data may flow into systems procurement has not approved

Green Flag	What It Shows
Separate controls for raw audio, unredacted transcript, redacted transcript, metadata, exports, and QA notes	The vendor understands data-class risk
Time-boxed support access with approval and audit logs	Human access is controlled
Customer-configurable retention and deletion evidence	The platform can match policy, not just store data
Sandbox tool calls and mocked side effects	The vendor can test workflows without touching production systems
AI-specific red-team or regression tests	The vendor tests behavior, not only infrastructure
Clear shared-responsibility model for private tenant or self-hosted deployment	Security ownership will not be ambiguous
Honest unresolved-risk list	The vendor has operational maturity

What This Checklist Cannot Prove

This checklist narrows the review. It does not replace a legal, security, or compliance decision.

Limitation	What to Do Instead
A questionnaire cannot prove runtime behavior.	Run a non-production POC with seeded sensitive fields, mocked tool calls, and access-log review.
SOC 2 scope can lag the product surface.	Ask which systems are covered, which are excluded, and whether the POC path matches the audited path.
Redaction demos can hide edge cases.	Test 7-10 realistic samples from your own call patterns, including noisy audio and partial identifiers.
Private deployment does not remove shared responsibility.	Write down who owns upgrades, monitoring, incident response, key rotation, and deletion evidence.

POC Gating Checklist

Before the POC handles sensitive production calls, make these gates explicit.

Gate	Pass Criteria	Owner
Data scope	POC data classes are listed and approved	Engineering + security
Data-use policy	No customer data is used for training or support debugging without approval	Legal + security
Access model	Raw audio, transcript, export, admin, and support permissions are separated	Security
Redaction	Sensitive fields are redacted before broad search or reporting	Security + QA
Retention	Audio, transcripts, metadata, and QA notes have default expiration rules	Compliance
Deletion	A sample delete can be executed and evidenced	Security + vendor
Subprocessors	Data categories and regions are reviewed	Legal + security
Tool safety	Tool calls run against mocks, sandboxes, or approved dry-run endpoints	Engineering
Incident process	Notification path and severity rules are known	Security + on-call
Exit plan	Data export and deletion after POC are written down	Procurement + legal

The POC can still be fast. The difference is that it starts with a smaller, approved data boundary instead of discovering the boundary after transcripts have already moved.

How Hamming Fits This Review

Hamming is built for teams that need to test and monitor voice agents before production failures become customer problems. In security review, the practical question is not "do you have a dashboard?" It is whether the platform can help teams evaluate real voice-agent behavior while respecting access, retention, redaction, and deployment constraints.

Use this checklist against Hamming too. Ask for the same evidence: data flow, retention, deletion, access controls, support access, subprocessors, deployment model, and AI-specific testing approach. Strong security review should make the buying process clearer, not more theatrical.

For broader vendor selection, pair this with questions to ask voice testing vendors, Hamming vs. Coval, and the voice agent production readiness checklist.