Basata

Basata scales healthcare voice agent testing

6x faster with Hamming

What a testing platform should do, no matter what kind of testing you're doing, is enable you to scale. And Hamming does that well.

Blake Jones, AI Engineer at Basata

Basata logo
Company Logo
Location
United States
Industry
Healthcare AI
Stage
Private

Use Cases:

  • Pre-deployment voice agent testing
  • Automated test case generation via API
  • Accessibility and persona testing
  • Agentic prompt optimization

Meet Basata

Basata uses AI to eliminate administrative overhead for healthcare providers, automating patient communication so clinicians can focus entirely on care. Their voice AI agents handle scheduling, patient outreach, and practice communication across specialties like cardiology.

www.basata.ai

Healthcare practices are old institutions with older technologies and processes. Patients rely on the phone as their central way to interact with their practice. Basata's AI agents bridge that gap, but the stakes are high: in cardiology, patients are dealing with heart problems, and missed communications have real-world consequences.

With every new customer bringing unique configurations, conversation flows, and guardrails, Basata needed a way to validate agent behavior programmatically. Manual testing simply could not keep pace with the growing complexity.

With Hamming, Basata was able to

Reduce testing time from 90 minutes to 15 minutes per cycle
Run all customer agent tests concurrently instead of one at a time
Auto-generate test cases from agent configuration via API
Close the feedback loop with agentic prompt optimization

The Challenge: Manual Testing Doesn't Scale in High-Stakes Healthcare

Basata's voice AI agents handle patient communication for healthcare practices where urgency is the norm. A missed appointment in cardiology isn't an inconvenience, it means a patient isn't getting care. Every agent deployment requires rigorous validation to ensure it behaves correctly, follows guardrails, and doesn't make promises the practice can't keep.

But each healthcare practice requires deep customization. Older institutions with established processes need AI that fits their workflows, not the other way around. Every customer agent has its own conversation flows, scheduling rules, and edge cases. Testing every path manually meant making individual phone calls, one at a time, for an hour and a half per cycle. And with LLMs producing nondeterministic responses, the same test needed to run multiple times for confidence.

Healthcare voice agent testing dashboard

The core challenge in summary:

  • 1.5 hours of manual phone calls per test cycle, growing with every new customer
  • One person, one call at a time, no parallelism possible
  • Cannot simulate diverse patient accents, demographics, or speech patterns
  • Socially exhausting, cognitive energy spent on calling instead of building
  • High-stakes healthcare context where failures mean patients don't get care

The Impact

The Impact
MetricResult
Faster testing cycles vs manual calls6x
Reduction in manual testing time83%
Full test suite execution, down from 90 min15 min

Before & After Hamming

Test execution time
1.5 hours of manual phone calls
15 minutes, all tests concurrent
Test parallelism
One person, one call at a time
Entire test suite runs simultaneously
Persona coverage
Team cannot simulate diverse accents or demographics
Full persona library with regional voices and speech patterns
Prompt optimization
Manual prompt engineering: trial and error
Automated agentic feedback loop from evaluator results
Scaling with customers
Testing effort grows linearly per customer
Programmatic generation scales automatically

Solution: API-First Testing That Scales with Every Customer

01

Programmatic Test Case Generation via API

Basata's AI agents are highly configurable per healthcare practice, each with its own conversation flows, scheduling rules, and guardrails. Manually writing and maintaining test cases for every configuration was not sustainable.

With Hamming's API, the team defines agent behavior in YAML and programmatically converts it into test cases. This approach enables them to:

  • Auto-generate test cases from configured agent functionality
  • Manage state across multi-step conversations (e.g., verifying appointments were actually booked)
  • Inject scenario-specific test data into each run
  • Keep tests in sync with agent configuration changes

The programmatic layer eliminates the need to manually create or update test cases through a UI.

Programmatic test generation dashboard
02

Concurrent Test Execution Across All Customers

Every healthcare practice Basata serves has its own agent with unique conversation flows and conditionals. Manually testing every path required individual phone calls, one at a time, with a single person on each.

With Hamming, the entire test suite runs concurrently. Instead of spending 90 minutes on serial calls, the team fires off all tests at once and gets results in 15 minutes. This concurrent execution is critical because:

  • Every agent has its own conversation flow with unique conditionals
  • LLMs are stochastic, so tests must run multiple times for confidence
  • The number of required tests grows exponentially with each new customer

What previously required an hour and a half of focused manual effort now completes in a fraction of the time, freeing the team to focus on building rather than testing.

Concurrent test execution dashboard
03

Diverse Persona Testing for Patient Accessibility

Basata serves a diverse patient population, many of whom rely on the phone as their primary way to interact with their healthcare practice. The team has observed that regional differences in voice and communication style significantly affect how patients respond to the AI agent, with some combinations causing immediate hang-ups.

With Hamming's persona library, the team tests across demographics they simply cannot replicate manually. Switching between genders, accents, and speaking speeds doesn't scale with a human team, but it's built into the platform:

  • Different accents and regional speech patterns
  • Varying speaking speeds and communication styles
  • Voice and gender combinations that trigger different patient responses
  • Demographics representative of real patient populations

The team plans to leverage this capability further through A/B testing different voice and persona combinations in production, using Hamming's data to validate which configurations work best for different patient demographics and regions, moving from reactive fixes to proactive optimization based on real data.

Diverse persona testing
04

Agentic Feedback Loops for Prompt Optimization

One of the most sophisticated uses of Hamming at Basata goes beyond traditional testing. The team feeds Hamming's LLM judge evaluator descriptions back into a local agent that automatically tunes system prompts, model selection, and agent configuration.

This creates a closed-loop optimization cycle:

  • Run test suite via Hamming API
  • Collect evaluator descriptions and failure analysis
  • Feed results to a local agent that updates the system prompt
  • Re-run tests to validate improvements

For the team, manual prompt engineering felt less like engineering and more like trial and error. This closed-loop system replaces that guesswork with data-driven optimization, letting the agents improve themselves.

Automated production scoring dashboard

Why Basata Chose Hamming

Basata evaluated their options and chose Hamming specifically for its programmatic, API-first approach. The ability to define, run, and manage tests entirely through code was the key differentiator that led them to switch to the platform.

The team values being realistic about what AI can and cannot do. They engineer their product around the ability to audit AI behavior, validate outputs, and integrate into existing healthcare workflows. Hamming fits that philosophy.

Basata chose Hamming because:

API-first programmatic approach: define tests in YAML, run via API, no click-ops
Composable state management for multi-step conversations (appointment verification)
Diverse persona library that the team cannot replicate manually
LLM judge evaluator descriptions that power agentic prompt optimization
Responsive team with a strong core platform

“I find talking to the agents completely socially exhausting. The platform offers all of these different personas that the team is not able to replicate. We want to make sure that our diverse customer base is represented in the agents themselves.”

Blake Jones, AI Engineer at Basata

Basata logo

The Results

Hamming has fundamentally changed how Basata validates their healthcare voice agents. What once required dedicated hours of exhausting manual calling now runs automatically, giving the team more time to focus on building better agents and serving more healthcare practices.

The impact grows with every new customer. Each new healthcare practice adds unique configurations that would have multiplied the manual testing burden. With Hamming, the testing infrastructure scales automatically.

01

6x Faster Testing Cycles

Testing time dropped from 90 minutes of manual phone calls to 15 minutes of automated concurrent execution. The team fires off all tests at once and reviews results, instead of spending an hour and a half on individual calls. This time savings compounds with every new customer deployment.

02

Tighter Development Feedback Loops

Before Hamming, each iteration cycle during development required another round of manual phone calls. The cognitive and social exhaustion of talking to AI agents drained energy from actual engineering work. Now the team can iterate rapidly: make a change, run the suite, review results, and repeat, all without picking up a phone.

03

Automated Prompt Optimization

The team built an agentic feedback loop that takes Hamming's evaluator descriptions and automatically tunes agent configuration. This closed-loop system replaces manual prompt engineering with data-driven optimization, improving agent quality while reducing human effort to near zero.

“It's been such a relief to be able to start shifting a lot of this stuff off our plates so we can go focus on more of the interesting problems.”

Blake Jones, AI Engineer at Basata

Basata logo

Featured customer stories

Grove Logo

How Grove AI ensures reliable clinical trial recruitment with Hamming