Customer Spotlight: How Lilac Labs (YC S24) Ensures Drive-Thru Order Accuracy with Hamming AI

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

November 8, 20242 min read
Customer Spotlight: How Lilac Labs (YC S24) Ensures Drive-Thru Order Accuracy with Hamming AI

How Lilac Labs Ensures Drive-Thru Order Accuracy with Hamming AI

Quick filter: If a missed allergy or wrong item shows up as a refund, you need automated testing—not spot checks.

The Stakes in Drive-Thru AI

Tony Kam and Shelden Shi founded Lilac Labs to automate drive-thru order taking. Sounds straightforward until you think about what happens when it goes wrong.

Miss an allergy modification? That's not a bug report—that's a potential health incident. Get the order wrong? The customer is sitting there, food in hand, and now someone has to fix it while the line backs up.

Drive-thru AI has no margin for error. And you can't manually test every possible order combination with every accent and background noise level.

What They Were Dealing With

Before Hamming, the Lilac Labs team was spending hours every day manually testing their system. Every time they changed something, they'd have to retest everything. And even then, they knew they were missing edge cases—complex orders, dietary restrictions, the scenarios customers actually throw at you.

"We're confident it handles a normal order. But what about 'no cheese, extra pickles, and I'm allergic to sesame'?"

How They Fixed It

We worked with them to build out three things:

Thousands of test scenarios covering the edge cases: dietary restrictions, allergies, modifications, multi-item orders with specific requests. The stuff that's hard to test manually because there are too many combinations.

Automated concurrent testing so they could run thousands of test calls simultaneously. Instead of someone dialing the agent 50 times, the system does it in parallel.

Production monitoring that watches real customer calls and flags the ones that look problematic. Then those real-world failures become new test cases automatically.

The Numbers

After implementation:

  • 5,000 test scenarios running continuously
  • 130,000 automated tests per year
  • 5,200 hours saved annually (that's basically 2.5 full-time people)
  • $520K in annual cost savings

But honestly, the number that matters most to them isn't cost savings—it's the allergies they catch in testing before they become incidents in production.

If you're building voice agents where accuracy really matters, let's talk.

Frequently Asked Questions

Lilac Labs uses Hamming to automate large-scale test calls for drive-thru ordering flows, including complex cases like allergies and dietary restrictions. They combine scenario generation, LLM-based evaluation, and production monitoring to catch failures early and turn real incidents into regression tests.

Drive-thru audio is noisy and fast, orders are multi-step, and mistakes are costly (wrong items, missed allergy constraints, unhappy customers). Reliability requires testing the full interaction: interruptions, corrections, menu edge cases, and long-tail phrasing—not just a handful of scripted examples.

By moving from manual QA to automated testing and monitoring, Lilac Labs scaled their coverage to thousands of scenarios and a large volume of automated tests per year, saving significant team time while increasing confidence in production reliability. It turns edge cases into routine checks instead of surprises.

Start by enumerating your highest-risk order flows, then build a dataset of scenarios that include realistic variations (noise, accents, interruptions, multi-item orders). Add automated evaluations for outcomes and critical constraints (like allergy handling), and convert every production failure into a permanent regression test.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”