How do you test a voice agent running on Genesys and Asterisk?

Test the complete call path: SIP setup, Genesys route, Asterisk channel behavior, voice-agent transcript, recording, trace, and QA result. Hamming recommends starting with 8-12 deterministic scenarios before running higher-volume production-mirror tests.

What should an Asterisk test harness capture for voice agent QA?

Capture the Asterisk channel unique ID, dialplan context, ARI or AMI events, recording pointer, DTMF events, bridge or transfer events, and the canonical call ID used by the QA system. Hamming's evidence contract treats these as join keys, not optional debug logs.

What Genesys artifacts matter when testing AI voice agents?

Capture the Genesys interaction or conversation ID, flow name, queue, transfer reason, contact attributes, and SIP diagnostic evidence when BYOC or SIP routing is involved. According to Hamming's runbook, those artifacts should be joined to the transcript, recording, trace ID, and evaluation result for every launch-blocking test.

Is a chatbot-style test enough for Genesys voice agents?

No. A chatbot-style test can prove the model answered a prompt, but it does not prove DTMF, SIP setup, media quality, IVR routing, transfers, recordings, or trace correlation. Hamming recommends using endpoint tests as the first gate and telephony-path tests as the launch gate for Genesys or Asterisk deployments.

How do you test DTMF, transfer, and fallback paths?

Run controlled calls that send expected digits, trigger transfer conditions, simulate no-answer or busy destinations, and force backend failures. Each test should assert the expected Genesys route, Asterisk event, voice-agent decision, recording, transcript, and QA outcome under one canonical call ID.

What release gates should block a Genesys or Asterisk voice agent launch?

Block launch when route, SIP, media, DTMF, transfer, recording, trace, QA, privacy, or regression evidence is missing or contradictory. Hamming's checklist treats missing replay evidence as a release blocker even if one manual test call sounded correct.

How does Hamming fit with Genesys or on-prem telephony?

Hamming fits as the independent QA and evidence layer around the voice-agent behavior: it evaluates calls, links failures to transcripts and traces, and turns repeatable production failures into regression tests. It does not need to replace Genesys or Asterisk to make the enterprise telephony path testable.

Genesys and Asterisk Voice Agent Testing: Enterprise Telephony QA Runbook

Genesys and Asterisk voice agent testing breaks when teams treat the voice agent like a standalone chatbot. The AI response can look right while the real call path drops DTMF, loses caller context, records the wrong channel, or hands off to the wrong queue.

If your agent only runs through a WebSocket endpoint, start with WebSocket voice agent testing. If your agent runs through Genesys, BYOC trunks, SIP routing, or on-prem Asterisk middleware, the phone path is part of the product. Test it as one system.

Genesys and Asterisk voice agent testing is the process of validating the full enterprise call path: SIP setup, IVR routing, Asterisk channel control, media quality, AI agent behavior, transfers, recordings, transcripts, traces, and QA outcomes.

Quick filter: if a failed test cannot answer "which Genesys interaction, SIP Call-ID, Asterisk channel, transcript, recording, and trace belong to this call?", the test is not production-grade yet.

TL;DR: Build the test runbook around 5 layers:

Telephony: SIP trunk, BYOC, carrier, codec, DTMF, media, and failover behavior.

Contact center: Genesys route, IVR flow, queue, transfer, and interaction metadata.

Asterisk middleware: dialplan, ARI/AMI events, channel IDs, recordings, and simulator behavior.

Voice agent: ASR, prompt version, tool calls, TTS, interruptions, and fallback handling.

QA evidence: recording pointer, transcript, trace ID, evaluation result, and regression decision.

Do not release on "the call sounded fine." Release when the evidence proves the call path, agent behavior, and business outcome all matched the test.

Methodology Note: This runbook is based on Hamming's analysis of 4M+ production voice agent testing and enterprise call-review workflows across 10K+ voice agents (2025-2026). We've tested agents built on LiveKit, Pipecat, ElevenLabs, Retell, Vapi, and custom-built solutions.
It also uses public Genesys, Asterisk, and SIP trunking documentation to ground the telephony-specific evidence requirements.

Across Hamming's analysis of 4M+ production calls across 10K+ voice agents, we found that enterprise failures are rarely isolated to one layer. One artifact says the caller asked for billing. Another says the call transferred. A third says the task failed. The test has to join those facts before anyone can trust the result.

Last Updated: June 2026

Related Guides:

Call Center Voice Agent Testing - broader call-center QA program design
IVR and Voice Agent Log Correlation - canonical call IDs across IVR, telephony, transcripts, and outcomes
Voice Agent Workflow Testing - tool-call, state, and side-effect guardrails
Voice Agent Call Evidence Export - reviewer-safe packets for transcripts, audio, traces, and QA results
OpenTelemetry for Voice Agents - span and event modeling for ASR, LLM, tools, and TTS
Voice Agent Incident Response Runbook - production triage when calls start failing
Voice Agent Log Retention Compliance Checklist - retention policy for recordings and transcripts
Testing Voice Agents for Production Reliability - release gates and regression policy

What Makes Genesys and Asterisk Voice Agent Testing Different?

In a Genesys or Asterisk deployment, the AI agent is only one participant in a longer telephony system. A test can pass at the model layer and still fail in production because the route, channel, recording, transfer, or media path behaved differently.

The most common mistake is testing a prompt in isolation, then assuming the same behavior will hold inside the contact-center call path.

Layer	What Can Fail	Evidence You Need
SIP and BYOC	INVITE failures, bad allowlists, retry/failover mismatch, codec negotiation, one-way audio	SIP Call-ID, response codes, PCAP, codec, packet loss, jitter
Genesys	wrong route, missing contact attributes, queue mismatch, transfer loop, bot flow branch drift	interaction ID, flow name, queue, DNIS/ANI policy, transfer reason
Asterisk	wrong dialplan branch, channel not entering ARI, recording missing, DTMF mismatch	channel unique ID, Stasis events, AMI/ARI events, recording file
Voice agent	ASR miss, prompt regression, tool-call failure, latency, hallucination, bad fallback	transcript, prompt version, tool trace, latency, evaluation score
QA workflow	reviewer cannot replay the call or prove the failure	canonical call ID, recording pointer, trace ID, packet manifest

Genesys SIP Diagnostics documents why SIP evidence matters for BYOC Cloud and BYOC Premises: PCAPs can show signaling history for a specific call across external trunks such as BYOC Cloud Carrier, PBX trunks, and premises Edge external trunks, but PCAP generation is best-effort and the files are available only for a limited window. BYOC Premises also needs conversation headers enabled on the trunk so captures can be queried by conversation ID. These are operational constraints, not footnotes. If the test runner waits too long to collect evidence, the call may no longer be debuggable.

Asterisk ARI channel docs describe channels as the path between an endpoint and Asterisk, with a channel unique ID and events such as StasisStart and StasisEnd. Those IDs are not marketing details. They are the keys that let a test connect Asterisk behavior to Genesys and voice-agent evidence.

Enterprise voice-agent test: a call scenario that proves the caller path, contact-center route, middleware behavior, AI decision, media artifact, and final outcome agree under one canonical call record.

Which Topology Are You Testing?

Do not start with test scripts. Start with topology. The same phrase, "Genesys plus Asterisk," can mean at least 5 different systems.

Topology	Asterisk Role	Genesys Role	Best First Test	Watch For
Genesys Cloud voice only	none	telephony, routing, bot or handoff	Genesys test route and interaction evidence	voice-agent layer hidden behind platform abstractions
Genesys BYOC to carrier or PBX	downstream SIP target	routing and media edge	inbound and outbound trunk call with PCAP and route evidence	allowlist, retry codes, media region, TLS/UDP/TCP mismatch
Asterisk as test caller	controlled simulator	system under test	deterministic inbound calls through test DNIS	simulator masking real carrier behavior
Asterisk as middleware bridge	media/control bridge	upstream or downstream route	bridge call with recording and channel correlation	lost IDs, wrong channel recording, DTMF conversion
Hybrid lab	simulator plus limited production mirror	isolated test route	staged happy path, fallback, transfer, and load slice	test route drifting from production route

Genesys BYOC documentation says BYOC Cloud trunks can connect Genesys Cloud to SIP-compliant carriers or devices reachable over the public internet, and supports UDP, TCP, and TLS. The same docs call out default retry and failover behavior for certain SIP codes. Your test plan needs to prove those rules match your environment, not just that one phone call connected.

For carrier-side references, Twilio Elastic SIP Trunking shows the operational shape many teams need to test: origination URIs, multiple SIP URIs for failover, edge location, recording, caller ID, firewall allowlists, and SIP OPTIONS behavior. Even if Twilio is not in your stack, the test categories are useful.

Topology rule: name the system under test in every scenario. "Asterisk test" does not tell a reviewer enough. "Asterisk-originated inbound call through Genesys BYOC route to billing voice agent" gives the test owner something they can replay.

What Scenarios Should the Test Harness Run?

Start with a small scenario matrix and make it boring before adding scale. A 200-call batch with unclear guardrails creates more noise than a 12-call suite with specific artifacts.

Scenario	Setup	Expected Result	Release Blocker
Happy path	Inbound caller reaches test route and completes one known task	correct route, transcript, tool result, recording, QA pass	any layer cannot be joined by call ID
DTMF path	Caller presses expected menu digits during IVR or agent prompt	digit captured and routed correctly	DTMF lost, duplicated, or treated as speech only
Transfer to queue	Agent transfers caller to human or queue	transfer reason and destination match policy	transfer loop, wrong queue, lost caller context
Fallback path	Backend API fails or agent cannot answer	safe fallback, escalation, or retry policy runs	agent invents answer or hangs silently
Caller interruption	Caller corrects the agent mid-response	agent stops, accepts correction, preserves task state	caller repeats or agent continues old path
Silence timeout	Caller does not respond after prompt	reprompt or escalation follows policy	timeout hidden as latency or ignored
No-answer or busy	downstream destination cannot answer	route follows fallback policy	call drops without useful reason
QoS degradation	controlled jitter, packet loss, or codec mismatch in lab	test records quality degradation and voice-agent impact	transcript only shows failure with no media evidence
Recording and replay	call should be reviewable after test	recording pointer, transcript, trace, and score open together	recording missing, wrong channel, or unsafe access

The Asterisk side should be explicit about whether it is generating calls, controlling channels, or only recording media. Asterisk recording docs distinguish live recordings from stored recordings. MixMonitor can record mixed or separated audio streams, but your QA result should know which channel was recorded.

Here is the minimum test case shape:

{  "testId": "genesys_billing_dtmf_transfer_001",  "topology": "asterisk-originated-call-to-genesys-byoc",  "callerFixture": "billing_customer_known_account",  "entryRoute": {    "dialedNumber": "+15551230000",    "genesysFlow": "billing-support-test",    "asteriskContext": "hamming_test_calls"  },  "expectedPath": [    "call.answered",    "ivr.prompt_played",    "dtmf.received",    "agent.session_started",    "tool.called",    "call.transferred"  ],  "guardrails": {    "dtmfDigit": "2",    "targetQueue": "billing_escalation_test",    "taskOutcome": "transferred_with_context",    "maxP95TurnLatencyMs": 3000  }}

The important part is not the particular field names. The important part is that the test names the route, caller fixture, expected events, and evidence required to prove the result.

What Evidence Should Every Test Capture?

Every test needs an evidence contract. Without it, QA ends up with a call recording, engineering ends up with logs, telephony ends up with PCAP, and nobody can prove they are looking at the same call.

Evidence	Required?	Source	Why It Matters
canonical call ID	yes	your test runner or orchestration layer	joins every artifact under one identity
Genesys interaction or conversation ID	yes	Genesys	proves route, flow, queue, and transfer behavior
SIP Call-ID and response codes	yes for SIP/BYOC tests	SIP trace or PCAP	proves setup, retry, failover, and disconnect path
Asterisk channel unique ID	yes for Asterisk tests	ARI/AMI/channel logs	joins dialplan, bridge, recording, and channel events
recording pointer	usually	Genesys, Asterisk, carrier, or storage	lets reviewers hear DTMF, silence, clipping, and interruptions
transcript and turn IDs	yes	voice-agent runtime or ASR	proves what the agent heard and said
prompt and agent version	yes	voice-agent runtime	ties behavior to shipped configuration
trace ID	yes for engineering QA	OpenTelemetry or app tracing	joins ASR, LLM, tools, TTS, and storage spans
evaluation result	yes	Hamming or QA system	gives pass/fail reason and regression label
redaction state	yes	evidence/export layer	protects recordings, transcripts, and account data

For the full identity model, use the IVR and voice agent log correlation runbook. For reviewer packaging, use the call evidence export runbook.

Evidence contract: a test is not complete until it can produce the IDs, recordings, transcripts, traces, and QA result needed to replay the call without a manual hunt through every production system.

Use a normalized event envelope so QA and engineering can search across providers:

{  "canonicalCallId": "call_2026_06_17_001",  "eventName": "telephony.transfer.completed",  "occurredAt": "2026-06-17T15:42:18.231Z",  "sourceSystem": "genesys",  "providerAliases": {    "genesysInteractionId": "interaction_test_001",    "sipCallId": "a84b4c76e66710@sample.org",    "asteriskChannelId": "asterisk01-1781710938.12",    "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"  },  "payload": {    "fromRoute": "billing_voice_agent_test",    "toQueue": "billing_escalation_test",    "transferReason": "caller_requested_human",    "recordingPointer": "s3://qa-evidence/call_001/audio.wav"  },  "qa": {    "evaluationId": "eval_001",    "result": "pass",    "regressionCandidate": false  }}

Keep the payload small enough to search and safe enough to share. Put raw audio, full transcripts, and account-specific data behind the right access controls.

How Do You Automate Asterisk and Genesys Test Runs?

Automation should make the test repeatable, not hide the call path. Treat Asterisk as a controlled participant and Genesys as the routing system under test.

Step	Action	Output	Common Mistake
1. Freeze topology	name trunk, route, test number, Asterisk context, and agent version	topology record	"testing Genesys" without a concrete route
2. Generate call	originate a controlled call or replay a known fixture	call attempt ID	using production caller identity by accident
3. Drive scenario	play audio, send DTMF, pause, interrupt, transfer, or hang up	event sequence	only testing one happy path
4. Capture artifacts	pull Genesys, SIP, Asterisk, transcript, recording, and trace evidence	evidence packet	collecting artifacts after retention window
5. Score behavior	evaluate route, transcript, latency, tool result, and outcome	QA result	pass/fail based only on transcript
6. Promote failures	turn repeatable failures into regression tests	test fixture	leaving incident evidence as a one-off note

Asterisk can be a useful simulator because it can originate calls, expose channel events, manipulate media, and record audio. That does not make it a complete substitute for real carrier traffic. We used to think a lab route was enough if the agent logic passed. After watching voice-agent failures move across telephony, IVR, and model layers, I now treat lab calls as the first gate, not the last gate.

For enterprise launches, use 3 rings of confidence:

Transport gate: WebSocket or direct endpoint test proves the voice-agent runtime can accept audio and emit events.
Telephony lab gate: Asterisk or SIP simulator proves the Genesys route, IVR, DTMF, transfer, recording, and agent behavior.
Production mirror gate: limited test numbers or controlled cohorts prove the same evidence contract survives real routing and media constraints.

The voice agent workflow testing runbook covers business guardrails. This runbook adds the enterprise telephony boundary: the workflow is not proven unless the route and evidence are proven too.

What Release Gates Should Block Launch?

Block launch on missing evidence, not just bad outcomes. A test that fails with good evidence is fixable. A test that passes without replayable evidence is fragile.

Gate	Pass Condition	Block Launch When
Route gate	test DNIS/ANI, flow, queue, and transfer match expected path	wrong queue, hidden fallback, or transfer loop
SIP gate	setup, retry/failover, and disconnect reason are visible	SIP Call-ID, response code, or PCAP evidence missing for a SIP failure
Media gate	audio is bidirectional and recording matches policy	one-way audio, wrong channel, clipping, or missing recording
DTMF gate	expected digits are captured and routed correctly	DTMF lost, duplicated, delayed, or interpreted as speech only
Agent gate	transcript, prompt version, tool calls, and TTS are traceable	model response cannot be connected to call evidence
QA gate	evaluation result names the failing guardrail	pass/fail has no reason, rubric, or artifact pointer
Privacy gate	transcript and audio redaction state is explicit	raw call artifacts leak into broad review paths
Regression gate	repeatable failures become fixtures	production failure remains a screenshot or anecdote

This is where production reliability testing and incident response connect. The release gate should tell the on-call engineer what evidence will exist when the same failure appears at 3am.

One unresolved tension: production-like telephony tests cost more than endpoint tests. They use real routes, numbers, SIP infrastructure, recordings, and sometimes carrier minutes. Do not run them for every tiny prompt edit. Use fast endpoint tests first, then run the enterprise telephony suite for changes that touch routing, caller identity, DTMF, transfers, recordings, latency, or regulated workflows.

How Hamming Fits Into the Enterprise Telephony QA Loop

Hamming does not need to be your Genesys administrator or your Asterisk PBX. Hamming fits as the independent QA and evidence layer around the voice-agent behavior.

Use Hamming to:

Generate or replay realistic caller scenarios before a route goes live.
Evaluate transcripts, audio, latency, tool calls, and policy adherence after each test call.
Attach QA results to call IDs, recordings, traces, and workflow evidence.
Find the few production calls worth reviewing instead of sampling blindly.
Promote failed Genesys, SIP, or Asterisk call patterns into repeatable regression tests.
Compare agent versions after prompt, model, ASR, TTS, route, or middleware changes.

The practical loop is:

define topologyrun controlled callcapture telephony + contact-center + agent evidencescore the callreview failurespromote repeatable failures into regression testsrerun before the next route, prompt, or provider change

If you already have a Genesys or Asterisk stack, keep it. The goal is not to replace your telephony system. The goal is to make sure your AI voice agent is tested with the same call path, evidence, and failure modes that real callers use.

Launch Checklist

Before shipping a Genesys or Asterisk voice agent path, verify:

Do the boring part first. Make the call path replayable. Once the evidence is joined, the AI failures become much easier to fix.

Genesys and Asterisk Voice Agent Testing: Enterprise Telephony QA Runbook

What Makes Genesys and Asterisk Voice Agent Testing Different?

Which Topology Are You Testing?

What Scenarios Should the Test Harness Run?

What Evidence Should Every Test Capture?

How Do You Automate Asterisk and Genesys Test Runs?

What Release Gates Should Block Launch?

How Hamming Fits Into the Enterprise Telephony QA Loop

Launch Checklist

Frequently Asked Questions

Sumanyu Sharma

Related Resources

Persistent Caller ID Testing for Inbound Voice Agents

Voice Agent QA POC Template: Pilot Plan and Scorecard

Voice Agent Call Evidence Export Runbook: Transcripts, Audio, Traces, and QA Packets