Basata scales healthcare voice agent testing

6x faster with Hamming

“What a testing platform should do, no matter what kind of testing you're doing, is enable you to scale. And Hamming does that well.”

Blake Jones, AI Engineer at Basata

Location: United States
Industry: Healthcare AI
Stage: Private

Use Cases:

•Pre-deployment voice agent testing
•Automated test case generation via API
•Accessibility and persona testing
•Agentic prompt optimization

Meet Basata

Basata uses AI to eliminate administrative overhead for healthcare providers, automating patient communication so clinicians can focus entirely on care. Their voice AI agents handle scheduling, patient outreach, and practice communication across specialties like cardiology.

www.basata.ai →

Healthcare practices are old institutions with older technologies and processes. Patients rely on the phone as their central way to interact with their practice. Basata's AI agents bridge that gap, but the stakes are high: in cardiology, patients are dealing with heart problems, and missed communications have real-world consequences.

With every new customer bringing unique configurations, conversation flows, and guardrails, Basata needed a way to validate agent behavior programmatically. Manual testing simply could not keep pace with the growing complexity.

With Hamming, Basata was able to

Reduce testing time from 90 minutes to 15 minutes per cycle

Run all customer agent tests concurrently instead of one at a time

Auto-generate test cases from agent configuration via API

Close the feedback loop with agentic prompt optimization

The Challenge: Manual Testing Doesn't Scale in High-Stakes Healthcare

Basata's voice AI agents handle patient communication for healthcare practices where urgency is the norm. A missed appointment in cardiology isn't an inconvenience, it means a patient isn't getting care. Every agent deployment requires rigorous validation to ensure it behaves correctly, follows guardrails, and doesn't make promises the practice can't keep.

But each healthcare practice requires deep customization. Older institutions with established processes need AI that fits their workflows, not the other way around. Every customer agent has its own conversation flows, scheduling rules, and edge cases. Testing every path manually meant making individual phone calls, one at a time, for an hour and a half per cycle. And with LLMs producing nondeterministic responses, the same test needed to run multiple times for confidence.

Healthcare voice agent testing dashboard

The core challenge in summary:

✕1.5 hours of manual phone calls per test cycle, growing with every new customer
✕One person, one call at a time, no parallelism possible
✕Cannot simulate diverse patient accents, demographics, or speech patterns
✕Socially exhausting, cognitive energy spent on calling instead of building
✕High-stakes healthcare context where failures mean patients don't get care

The Impact

Before & After Hamming

Metric

Before

After

Test execution time

1.5 hours of manual phone calls

15 minutes, all tests concurrent

Test parallelism

One person, one call at a time

Entire test suite runs simultaneously

Persona coverage

Team cannot simulate diverse accents or demographics

Full persona library with regional voices and speech patterns

Prompt optimization

Manual prompt engineering: trial and error

Automated agentic feedback loop from evaluator results

Scaling with customers

Testing effort grows linearly per customer

Programmatic generation scales automatically

Solution: API-First Testing That Scales with Every Customer

Programmatic Test Case Generation via API

Basata's AI agents are highly configurable per healthcare practice, each with its own conversation flows, scheduling rules, and guardrails. Manually writing and maintaining test cases for every configuration was not sustainable.

With Hamming's API, the team defines agent behavior in YAML and programmatically converts it into test cases. This approach enables them to:

Auto-generate test cases from configured agent functionality
Manage state across multi-step conversations (e.g., verifying appointments were actually booked)
Inject scenario-specific test data into each run
Keep tests in sync with agent configuration changes

The programmatic layer eliminates the need to manually create or update test cases through a UI.

Concurrent Test Execution Across All Customers

Every healthcare practice Basata serves has its own agent with unique conversation flows and conditionals. Manually testing every path required individual phone calls, one at a time, with a single person on each.

With Hamming, the entire test suite runs concurrently. Instead of spending 90 minutes on serial calls, the team fires off all tests at once and gets results in 15 minutes. This concurrent execution is critical because:

Every agent has its own conversation flow with unique conditionals
LLMs are stochastic, so tests must run multiple times for confidence
The number of required tests grows exponentially with each new customer

What previously required an hour and a half of focused manual effort now completes in a fraction of the time, freeing the team to focus on building rather than testing.

Diverse Persona Testing for Patient Accessibility

Basata serves a diverse patient population, many of whom rely on the phone as their primary way to interact with their healthcare practice. The team has observed that regional differences in voice and communication style significantly affect how patients respond to the AI agent, with some combinations causing immediate hang-ups.

With Hamming's persona library, the team tests across demographics they simply cannot replicate manually. Switching between genders, accents, and speaking speeds doesn't scale with a human team, but it's built into the platform:

Different accents and regional speech patterns
Varying speaking speeds and communication styles
Voice and gender combinations that trigger different patient responses
Demographics representative of real patient populations

The team plans to leverage this capability further through A/B testing different voice and persona combinations in production, using Hamming's data to validate which configurations work best for different patient demographics and regions, moving from reactive fixes to proactive optimization based on real data.

Agentic Feedback Loops for Prompt Optimization

One of the most sophisticated uses of Hamming at Basata goes beyond traditional testing. The team feeds Hamming's LLM judge evaluator descriptions back into a local agent that automatically tunes system prompts, model selection, and agent configuration.

This creates a closed-loop optimization cycle:

Run test suite via Hamming API
Collect evaluator descriptions and failure analysis
Feed results to a local agent that updates the system prompt
Re-run tests to validate improvements

For the team, manual prompt engineering felt less like engineering and more like trial and error. This closed-loop system replaces that guesswork with data-driven optimization, letting the agents improve themselves.

Why Basata Chose Hamming

Basata evaluated their options and chose Hamming specifically for its programmatic, API-first approach. The ability to define, run, and manage tests entirely through code was the key differentiator that led them to switch to the platform.

The team values being realistic about what AI can and cannot do. They engineer their product around the ability to audit AI behavior, validate outputs, and integrate into existing healthcare workflows. Hamming fits that philosophy.

Basata chose Hamming because:

API-first programmatic approach: define tests in YAML, run via API, no click-ops

Composable state management for multi-step conversations (appointment verification)

Diverse persona library that the team cannot replicate manually

LLM judge evaluator descriptions that power agentic prompt optimization

Responsive team with a strong core platform

“I find talking to the agents completely socially exhausting. The platform offers all of these different personas that the team is not able to replicate. We want to make sure that our diverse customer base is represented in the agents themselves.”

Blake Jones, AI Engineer at Basata

The Results

Hamming has fundamentally changed how Basata validates their healthcare voice agents. What once required dedicated hours of exhausting manual calling now runs automatically, giving the team more time to focus on building better agents and serving more healthcare practices.

The impact grows with every new customer. Each new healthcare practice adds unique configurations that would have multiplied the manual testing burden. With Hamming, the testing infrastructure scales automatically.

6x Faster Testing Cycles

Testing time dropped from 90 minutes of manual phone calls to 15 minutes of automated concurrent execution. The team fires off all tests at once and reviews results, instead of spending an hour and a half on individual calls. This time savings compounds with every new customer deployment.

Tighter Development Feedback Loops

Before Hamming, each iteration cycle during development required another round of manual phone calls. The cognitive and social exhaustion of talking to AI agents drained energy from actual engineering work. Now the team can iterate rapidly: make a change, run the suite, review results, and repeat, all without picking up a phone.

Automated Prompt Optimization

The team built an agentic feedback loop that takes Hamming's evaluator descriptions and automatically tunes agent configuration. This closed-loop system replaces manual prompt engineering with data-driven optimization, improving agent quality while reducing human effort to near zero.

“It's been such a relief to be able to start shifting a lot of this stuff off our plates so we can go focus on more of the interesting problems.”

Blake Jones, AI Engineer at Basata

Featured customer stories

How Grove AI ensures reliable clinical trial recruitment with Hamming

Read the case study

How Hamming enables Podium to consistently deliver multi-language AI voice support at scale

Read the case study

How NextDimensionAI ships safer, faster healthcare voice agents with Hamming

Read the case study

How Grove AI ensures reliable clinical trial recruitment with Hamming

Read the case study