How Lilac Labs Ensures Drive-Thru Order Accuracy with Hamming AI
Quick filter: If a missed allergy or wrong item shows up as a refund, you need automated testing—not spot checks.
The Stakes in Drive-Thru AI
Tony Kam and Shelden Shi founded Lilac Labs to automate drive-thru order taking. Sounds straightforward until you think about what happens when it goes wrong.
Miss an allergy modification? That's not a bug report—that's a potential health incident. Get the order wrong? The customer is sitting there, food in hand, and now someone has to fix it while the line backs up.
Drive-thru AI has no margin for error. And you can't manually test every possible order combination with every accent and background noise level.
What They Were Dealing With
Before Hamming, the Lilac Labs team was spending hours every day manually testing their system. Every time they changed something, they'd have to retest everything. And even then, they knew they were missing edge cases—complex orders, dietary restrictions, the scenarios customers actually throw at you.
"We're confident it handles a normal order. But what about 'no cheese, extra pickles, and I'm allergic to sesame'?"
How They Fixed It
We worked with them to build out three things:
Thousands of test scenarios covering the edge cases: dietary restrictions, allergies, modifications, multi-item orders with specific requests. The stuff that's hard to test manually because there are too many combinations.
Automated concurrent testing so they could run thousands of test calls simultaneously. Instead of someone dialing the agent 50 times, the system does it in parallel.
Production monitoring that watches real customer calls and flags the ones that look problematic. Then those real-world failures become new test cases automatically.
The Numbers
After implementation:
- 5,000 test scenarios running continuously
- 130,000 automated tests per year
- 5,200 hours saved annually (that's basically 2.5 full-time people)
- $520K in annual cost savings
But honestly, the number that matters most to them isn't cost savings—it's the allergies they catch in testing before they become incidents in production.
If you're building voice agents where accuracy really matters, let's talk.

