Last month, a customer called us after discovering their voice agent had been logging complete credit card numbers in production for three weeks. Not in the transcript display—that was redacted. In the debug logs. Specifically, in the OpenTelemetry spans their engineering team used for latency debugging.
This is the pattern we see repeatedly: teams redact PII in the obvious places and miss it everywhere else. The voice agent says "I won't repeat your card number for security reasons," while the underlying infrastructure stores it six different ways.
Here's what most teams get wrong: PII redaction for voice agents isn't a logging configuration. It's not a database filter. It's a pipeline—a dedicated layer that intercepts sensitive data before central storage across every component that touches voice data.
TL;DR: Implement PII redaction for voice agents using Hamming's Voice Data Protection Framework:
- Pipeline Architecture: No mainstream general-purpose logging framework handles voice PII out of the box—you need a dedicated redaction layer before central storage
- Transcription-First: Transcribe first, then redact immediately—real-time processing prevents PII from ever reaching your database
- Multi-Channel Coverage: Voice agents require coordinated redaction across transcripts, audio recordings, debug logs, and observability traces
The difference between compliant and non-compliant voice agents isn't intent—it's architecture.
Related Guides:
- PII Redaction Compliance & Architecture Guide — HIPAA, PCI-DSS, GDPR requirements and encryption standards
- AI Voice Agent Compliance & Security — HIPAA, PCI DSS, and SOC 2 testing
- Voice Agent Observability & Tracing Guide — Distributed tracing for voice
- Logging and Analytics Architecture for Voice Agents — Complete logging infrastructure
Methodology Note: The benchmarks and patterns in this guide are derived from Hamming's analysis of 4M+ voice agent calls across 10K+ production voice agents (2025-2026).The focus is compliance testing in healthcare and financial services.
Who Doesn't Need This Guide
If you're building a demo voice agent with synthetic data and no production traffic, standard logging practices are fine. Skip this guide.
If your voice agent never handles PII—no names, no account numbers, no payment data, no health information—you can get by with general observability tools.
This guide is for teams deploying voice agents that process real customer data: payment flows, healthcare scheduling, account management, customer service with authentication. If callers say their social security number, credit card, or medical information out loud, you need a redaction pipeline.
Why Voice Agents Create Unique PII Exposure Risks
Voice agents aren't web forms. When a customer types a credit card number into a form, you control exactly where that data goes. When a customer speaks a credit card number, that data potentially flows through:
- Real-time transcription streams (STT)
- Final transcript storage
- Audio recordings (both channels)
- Call replay functionality
- Debug logs
- OpenTelemetry traces and spans
- Analytics pipelines
- QA review queues
- LLM context windows
One customer speaks one card number, and it appears in nine different systems. Miss any one of them, and you've created a compliance violation.
The Data Breach Cost Reality
The average US data breach cost reached $9.36 million in 2024, with 53% involving customer PII. For voice agents, the exposure surface is larger because the same sensitive data replicates across multiple storage layers.
| Regulation | Maximum Penalty | Voice-Specific Risk |
|---|---|---|
| GDPR | 4% of global revenue or €20M | Transcripts with EU customer names/addresses |
| HIPAA | $50K per violation, $1.5M annual cap | Spoken PHI in healthcare scheduling |
| PCI DSS | $100K/month + card brand fines | Credit card numbers in call recordings |
The Gap Between "Redacted" and Actually Redacted
Most teams think they've solved PII redaction when their transcript viewer shows asterisks. The real question: where else did that data go before the asterisks appeared?
We call this the "redaction theater" problem. The UI shows redacted content, but the underlying logs, traces, and recordings contain the original data. It looks compliant. It isn't.
How PII Redaction Actually Works for Voice Agents
Critical understanding: PII redaction for voice agents is not a feature of your logging framework. No mainstream general-purpose logging framework handles this out of the box.
Instead, you need to add a pipeline layer—a dedicated processing stage that intercepts voice data before central storage and scrubs sensitive content. This applies whether you're building custom infrastructure or using platforms like LiveKit or Vapi.
Hamming's Voice Data Protection Framework
Based on our analysis of 4M+ production voice calls, effective PII redaction requires coordinated protection at five layers:
| Layer | What to Protect | Implementation Approach |
|---|---|---|
| 1. Transcription | STT output streams | Transcribe first, then redact before storage |
| 2. Audio | Recording files (both channels) | Audio segment replacement or deletion |
| 3. Logging | Application and debug logs | Custom log filters/appenders |
| 4. Tracing | OpenTelemetry spans | Span processors with attribute scrubbing |
| 5. Analytics | Downstream pipelines | Pre-aggregation redaction or anonymization |
The Transcribe-First-Then-Redact Pattern
This is the pattern OpenAI and other LLM providers recommend, and it applies directly to voice agents:
Audio → STT Transcription → PII Detection → Redaction → Storage
↓
(Original never stored)
The key insight: you transcribe first, then redact immediately. The original unredacted transcript should never reach your central storage. This prevents the compliance gap where sensitive data exists—even briefly—in your database.
Wrong approach:
Audio → Transcription → Storage → Background Redaction Job
↓
(PII exists in database for hours/days)
Correct approach:
Audio → Transcription → Inline Redaction → Storage
↓
(Only redacted content reaches database)
PII Detection Methods: Choosing Your Approach
Named Entity Recognition (NER) with Machine Learning
Modern NER models achieve 94-96% F1 scores for PII detection. They handle context better than pattern matching—distinguishing a social security number from an account number based on surrounding conversation.
| Approach | Accuracy | Speed | Best For |
|---|---|---|---|
| ML-based NER | 94-96% F1 | 10-50ms | Production systems needing high recall |
| Pattern matching (Regex) | 60-80% F1 | <5ms | Known-format data (card numbers) |
| Hybrid | 92-95% F1 | 15-60ms | Balanced accuracy/performance |
When NER excels: Detecting names, addresses, medical conditions, and context-dependent PII where the same string might or might not be sensitive.
When regex works: Credit card numbers, SSNs, phone numbers—structured data with predictable formats.
Context-Aware Detection vs. Pattern Matching
Pattern matching fails when the same sequence of numbers means different things:
- "123-45-6789" could be an SSN or a phone extension
- "4111-1111-1111-1111" is clearly a test card number
- "John" might be a customer name or might be referring to "the john" (bathroom)
Context-aware detection uses surrounding conversation to determine whether a pattern represents PII. This matters for voice agents because callers speak naturally, without the structured input fields that web forms provide.
Real-Time vs. Batch Processing
Real-time redaction scrubs PII before database writes. This is the approach we recommend:
Advantages:
- PII never reaches your storage
- Compliance is architectural, not operational
- No "redaction debt" accumulating
Disadvantages:
- Adds latency to transcript processing (typically 10-50ms)
- Requires inline processing infrastructure
Batch redaction processes stored data on a schedule:
Advantages:
- Can use higher-accuracy models
- Doesn't impact real-time latency
Disadvantages:
- PII exists in your database until processed
- Creates compliance gaps during the window
- Requires tracking what's been processed
For voice agents handling sensitive data, we recommend real-time redaction. The latency cost is minimal compared to the compliance risk of storing unredacted transcripts.
Implementation Patterns
1. Middleware-Based Redaction
Add a redaction layer into your pipeline before logging. This works for REST APIs, tRPC routes, and other server-side handlers.
// Middleware pattern for transcript redaction
const piiRedactionMiddleware = async (transcript: string) => {
const entities = await detectPII(transcript);
return redactEntities(transcript, entities);
};
// Applied before storage
app.post('/transcript', async (req, res) => {
const redacted = await piiRedactionMiddleware(req.body.transcript);
await storeTranscript(redacted); // Only redacted version stored
res.json({ status: 'stored' });
});
The middleware approach ensures every code path that stores transcripts goes through redaction. It's harder to accidentally bypass than point-of-use redaction.
2. OpenTelemetry Span Processors
For teams using distributed tracing, OpenTelemetry span processors can scrub PII from trace attributes before export:
// Custom span processor for PII redaction
class PIIRedactingSpanProcessor implements SpanProcessor {
onEnd(span: ReadableSpan): void {
const attributes = span.attributes;
for (const [key, value] of Object.entries(attributes)) {
if (typeof value === 'string' && containsPII(value)) {
span.setAttribute(key, redactPII(value));
}
}
}
}
// Register with your tracer provider
tracerProvider.addSpanProcessor(new PIIRedactingSpanProcessor());
This catches PII that engineers add to traces for debugging—the most common source of "hidden" PII exposure we see in production deployments.
3. Custom Log Filters and Appenders
Standard logging frameworks (Winston, Pino, Bunyan) support custom formatters. Add PII redaction as a log transform:
// Pino custom redaction
const logger = pino({
formatters: {
log: (object) => {
return redactObjectPII(object);
}
}
});
This protects debug logs, error messages, and any other log output from containing raw PII.
4. Standalone Redaction Services
For larger deployments, a dedicated redaction service provides centralized PII detection:
| Service | Capabilities | Integration |
|---|---|---|
| Microsoft Presidio | 50+ entity types, 49 languages | Self-hosted, open source |
| AWS Comprehend | Entity detection, PHI-specific | AWS API |
| Google Cloud DLP | Pattern + ML detection | GCP API |
| Custom NER | Domain-specific entities | Self-hosted |
Standalone services work well when multiple applications need redaction, or when you need language coverage beyond English.
5. Audio-Level Redaction
Text redaction doesn't protect audio recordings. For full compliance, you need audio-level redaction:
Approaches:
- Silence replacement: Replace PII audio segments with silence (simple but noticeable)
- Tone replacement: Replace with a beep or tone (standard in legacy call centers)
- Audio masking: Apply noise or distortion to PII segments
Audio redaction adds processing latency and complexity. Consider whether you need to store audio at all, or whether redacted transcripts are sufficient for your use case.
Related: See our guide on voice agent testing for healthcare for HIPAA-specific audio handling requirements.
Voice Agent-Specific Challenges
Dual-Channel Audio Processing
Voice agents have two audio streams: the agent and the caller. Both must be redacted before merging or storage.
Caller Audio ──→ STT ──→ Redact ──→ Store
↓
Agent Audio ──→ STT ──→ Redact ──→ Merge ──→ Combined Recording
Miss the caller channel, and you've stored their spoken PII. Miss the agent channel, and you've potentially stored PII the agent repeated back (a design flaw, but one we see regularly).
Real-Time Transcription Streams
STT providers like Deepgram, AssemblyAI, and Google Speech-to-Text deliver transcripts in chunks. PII might span chunk boundaries:
Chunk 1: "My social security number is 123"
Chunk 2: "-45-6789"
Your redaction pipeline needs to handle partial entities across chunks. Most solutions buffer 2-3 chunks to ensure complete entity detection.
Multi-Component Logging
A typical voice agent deployment includes:
- Voice platform logs (Twilio, LiveKit, Daily)
- LLM provider logs (OpenAI, Anthropic, Google)
- Application logs
- Database audit logs
- CDN/storage access logs
Each component may log the same PII. Comprehensive redaction requires addressing all of them.
This is why we emphasize the pipeline approach: you redact once, at the earliest possible point, rather than trying to configure redaction in every downstream system.
Maintaining Observability with Redacted Transcripts
Redaction that makes transcripts useless for debugging defeats the purpose. Here's how to maintain observability while protecting PII.
Structured Redaction Labels
Instead of generic asterisks, use category labels:
Bad: "My card number is ************"
Good: "My card number is [CREDIT_CARD]"
Labels preserve conversation structure for debugging. You can still see that the caller provided a card number, when they provided it, and how the agent responded.
Preserving Conversation Context
Effective redaction preserves:
- Intent: "I want to update my payment method"
- Sentiment: Frustrated, satisfied, confused
- Flow: Question → Answer → Confirmation
- Timing: When each turn occurred
What you redact:
- Entity values: The actual card number, SSN, name
- Not entity presence: The fact that a card number was spoken
This lets you debug conversation flow issues without accessing sensitive data.
Audit Trail Requirements
For compliance, maintain logs of:
- What was redacted (entity types, not values)
- When redaction occurred
- Which policy was applied
- Redaction service version
This documentation proves your redaction pipeline is functioning and supports compliance audits.
Security Architecture for Voice Data
Encryption Requirements
| Stage | Encryption | Standard |
|---|---|---|
| Audio in transit | TLS 1.2+ | Required |
| Audio at rest | AES-256 | Required |
| Transcripts in transit | TLS 1.2+ | Required |
| Transcripts at rest | AES-256 | Required |
| Logs containing voice data | AES-256 | Recommended |
Encryption protects data at rest and in transit, but doesn't protect against authorized access. Redaction removes the sensitive data entirely.
Access Controls and RBAC
Limit access to unredacted content:
- Engineering: Access to redacted transcripts only
- QA: Access to redacted transcripts + audio
- Compliance: Access to audit logs + redaction reports
- Security: Access to all logs for incident response
Monitor access patterns with SIEM integration. Unusual access to voice data should trigger alerts.
Retention Policies
| Data Type | Redacted Retention | Unredacted Retention |
|---|---|---|
| Transcripts | Per business need | 0 (never store) |
| Audio | Per business need | 24-72 hours max |
| Debug logs | 30-90 days | 0 (never store) |
| Traces | 7-30 days | 0 (never store) |
The safest unredacted retention policy is zero. If your pipeline works correctly, unredacted data never reaches storage.
Commercial PII Redaction Solutions
Voice Platform-Native Options
Several voice agent platforms and transcription services include built-in PII redaction capabilities:
AssemblyAI: Provides PII redaction as a transcription feature. Detects and redacts 20+ entity types during transcription—implementing the "transcribe first, then redact" pattern at the STT layer.
Amazon Transcribe: Includes automatic PII redaction with configurable entity types. Works for both batch and streaming transcription.
Note: Some voice agent platforms offer partial PII redaction for transcripts and recordings, but this typically doesn't cover your application logs, traces, or other system components where PII can leak.
The platform-native approach handles transcript redaction but typically doesn't cover your application's logs, traces, or downstream analytics pipelines.
Standalone Detection Tools
Microsoft Presidio: Open-source PII detection supporting 50+ entity types and 49 languages. Self-hosted, so you control the data. Good for organizations that can't send data to third-party APIs.
Google Cloud DLP: Comprehensive detection with both pattern matching and ML-based detection. Integrates with GCP services.
MiaRec, CallMiner: Enterprise call center solutions with built-in redaction. More expensive but include compliance reporting and audit trails.
How Hamming Fits
Hamming's compliance testing platform validates that your redaction pipeline actually works. We simulate conversations containing PII, verify redaction occurs correctly, and catch the gaps—like PII leaking into debug logs while transcripts appear clean.
This complements your redaction implementation. The redaction pipeline handles the scrubbing; Hamming tests that the scrubbing works across all data paths.
Testing Your Redaction Pipeline
Creating Test Scenarios
Build a synthetic PII dataset with known entities:
{
"test_cases": [
{
"input": "My name is John Smith and my SSN is 123-45-6789",
"expected_entities": ["PERSON_NAME", "SSN"],
"expected_output": "My name is [PERSON_NAME] and my SSN is [SSN]"
},
{
"input": "Call me at 555-123-4567",
"expected_entities": ["PHONE_NUMBER"],
"expected_output": "Call me at [PHONE_NUMBER]"
}
]
}
Run these through your pipeline and verify outputs match expectations. Track recall (did you catch all PII?) and precision (did you over-redact?).
Validation Checklist
For each storage location that might contain voice data:
- Transcripts: Verified PII replaced with category labels
- Audio recordings: Verified PII segments replaced or removed
- Application logs: Verified no PII in log output
- OpenTelemetry traces: Verified no PII in span attributes
- Error messages: Verified exceptions don't include PII
- Analytics exports: Verified downstream data is redacted
- Backup systems: Verified backups contain only redacted data
Continuous Monitoring
Redaction pipelines can fail silently. Implement ongoing validation:
- Run synthetic PII tests daily against production
- Monitor redaction service latency and availability
- Alert on unexpected PII patterns in storage (grep for SSN patterns, etc.)
- Review a sample of redacted transcripts weekly
Common Failure Modes
Partial Transcript Timing Issues
Real-time STT streams arrive faster than redaction can process. Without buffering, you might store partial transcripts before redaction completes.
Fix: Buffer transcript chunks until redaction confirms processing. Only release to storage after redaction completes.
False Positives and Over-Redaction
Aggressive redaction can break transcript usability:
| Input | Output | Result |
|---|---|---|
| "Call John at 3pm" | "Call [PERSON_NAME] at [TIME]" | Correct |
| "John Deere tractor" | "[PERSON_NAME] Deere tractor" | Over-redaction |
Balance recall (catching all PII) against precision (not redacting non-PII). For voice agents, we recommend erring toward higher recall—false positives are annoying but not a compliance violation.
Multi-Language and Dialect Variations
NER accuracy degrades with:
- Non-English languages (especially lower-resource languages)
- Regional accents affecting STT accuracy
- Code-switching (mixing languages mid-conversation)
- Slang and colloquial expressions
Test your pipeline with representative accent and language samples from your actual caller population.
Flaws But Not Dealbreakers
Real-time redaction adds latency. Typically 10-50ms per transcript chunk. For most voice agents, this is imperceptible. For ultra-low-latency applications, you'll need to optimize your detection pipeline or accept the tradeoff.
Audio redaction quality varies. Silence replacement is obvious; tone replacement sounds dated. More sophisticated audio masking adds complexity. Many teams decide transcript redaction is sufficient and don't store audio at all.
No redaction catches everything. Callers may spell out sensitive information letter by letter, use code words, or provide PII in unexpected formats. Redaction reduces exposure; it doesn't eliminate it entirely.
Maintenance is ongoing. PII patterns evolve, new entity types emerge, and models need retraining. Budget for ongoing maintenance, not just initial implementation.
Pre-Production Compliance Checklist
Before deploying a voice agent that handles PII:
- Identified all storage locations where voice data flows
- Implemented redaction pipeline with middleware/processor pattern
- Configured OpenTelemetry span processors for trace redaction
- Added custom log filters for application logs
- Decided on audio storage policy (redact, delete, or don't store)
- Created synthetic PII test suite
- Validated redaction across all storage locations
- Established monitoring for redaction pipeline health
- Documented redaction policies for compliance audits
- Trained team on avoiding PII in debug output
Related: See our HIPAA PHI Clinical Workflow Testing Checklist for healthcare-specific requirements.
Summary
PII redaction for voice agents requires a dedicated pipeline layer—no mainstream general-purpose logging framework handles this out of the box. The architecture that works:
- Transcribe first, then redact immediately before central storage
- Use middleware and span processors to catch PII in logs and traces
- Address dual-channel audio if you store recordings
- Test continuously with synthetic PII to validate your pipeline
- Monitor for gaps across all storage locations
The teams that get this right treat PII redaction as architecture, not configuration. It's a pipeline layer that sits between voice data sources and storage, not an afterthought bolted onto existing logging.
For production voice agents handling sensitive data, there is no alternative. The question isn't whether to implement PII redaction—it's whether your implementation actually covers all the places where PII can leak.
Hamming's compliance testing helps validate your redaction pipeline works across all data paths. We've seen too many teams with "complete" redaction that missed debug logs, traces, or downstream analytics. Testing the pipeline is as important as building it.
Ready to validate your voice agent's PII redaction? Book a demo to see how Hamming's compliance testing catches redaction gaps before they become breaches.

