Test LiveKit Agents
Run LiveKit-to-LiveKit WebRTC tests with no phone numbers required. Auto-provision rooms, auto-generate test scenarios from your prompt, and replay sessions with full transcripts and 50+ quality metrics.
Time to value
First test report in under 10 minutes
Connect your provider, sync your agents, and validate real calls in one workflow.
Add your LiveKit API key, secret, and server URL.
Choose auto-provisioned or customer-controlled rooms.
Trigger a test room and replay the session after the run.
What you need
- LiveKit API key and secret (LiveKit Cloud > Settings > Keys).
- Room creation plan (auto-provisioned or customer-controlled).
- Optional: enable recording if you want MP4 archives.
Connect in minutes
- 1Go to Agents > Providers > Connect LiveKit.
- 2Enter API key, secret, and server URL.
- 3Select auto or customer-controlled room provisioning.
- 4Run a test room and confirm transcripts populate.
Validation checklist
Confirm the integration is working before scaling your tests.
- Provider shows Connected in Agents > Providers.
- Agents appear in Agents > List with the provider badge.
- A test run produces transcripts and audio in the run summary.
- Test room appears and session replay works after the run.
Provider-specific capabilities
Built for LiveKit teams
Provider-aware testing and monitoring without changing your stack.
Test LiveKit agents with no phone numbers or SIP setup.
Auto-create rooms or use your own webhook workflow.
Review LiveKit sessions with full audio and transcripts.
50+ quality metrics
What we measure
Comprehensive evaluation across accuracy, conversation quality, voice performance, and task completion.
Accuracy & Correctness
- Factual accuracy
- Intent recognition
- Response relevance
- Hallucination detection
Conversation Quality
- Turn-taking flow
- Interruption handling
- Context retention
- Conversation completion
Voice & Audio
- Latency (time to first word)
- Speech clarity
- Background noise handling
- Accent robustness
Task Completion
- Tool call success rate
- API integration reliability
- Goal completion rate
- Error recovery
Independent evaluation
Why vendor-neutral testing?
Get unbiased results with consistent metrics across all providers—not self-reported scores from your vendor.
| Aspect | Provider built-in testing | Hamming |
|---|---|---|
| Objectivity | Optimized for their platform | Vendor-neutral evaluation |
| Consistency | Metrics vary by provider | Same 50+ metrics across all providers |
| Cross-vendor comparison | Can't compare across vendors | A/B test agents across any provider |
| Independence | Self-reported results | Third-party validation |
| Compliance | Limited audit trail | SOC 2 certified, audit-ready reports |
| Scale | Playground-level testing | 1000+ concurrent production tests |
What you get with Hamming
- Auto-generate test cases and assertions from your prompt.
- Pull tool call data, transcripts, and recordings directly from your provider.
- Run your first test in under 10 minutes with 50+ built-in metrics quality metrics.
- Test both voice and chat agents with unified evaluation.
Frequently Asked Questions
Everything you need to know about testing LiveKit agents with Hamming.
Connect your LiveKit API credentials to Hamming, configure room provisioning (auto or customer-controlled), and run automated tests. Hamming simulates real conversations via WebRTC and evaluates agent responses with 50+ quality metrics.
No. Hamming tests LiveKit agents via native WebRTC, eliminating the need for phone numbers or SIP infrastructure. Tests run directly through LiveKit rooms.
Hamming evaluates 50+ metrics including response accuracy, latency, conversation flow, intent recognition, and custom assertions. All metrics are audio-native with 95-96% agreement with human evaluators.
Yes. Every test run captures full transcripts, audio recordings, and tool call data. You can replay sessions and debug issues with complete conversation context.
Most teams run their first test within 10-15 minutes. Connect your LiveKit API key and secret, configure room settings, and Hamming auto-generates test scenarios from your agent's prompt.
Hamming simulates real user behavior including interruptions, overlapping speech, and mid-sentence changes. Tests validate that agents handle barge-in gracefully without losing conversation context.