Genesys and Asterisk Voice Agent Testing: Enterprise Telephony QA Runbook

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 4M+ voice agent calls to find where they break.

June 17, 2026Updated June 17, 202615 min read
Genesys and Asterisk Voice Agent Testing: Enterprise Telephony QA Runbook

Genesys and Asterisk voice agent testing breaks when teams treat the voice agent like a standalone chatbot. The AI response can look right while the real call path drops DTMF, loses caller context, records the wrong channel, or hands off to the wrong queue.

If your agent only runs through a WebSocket endpoint, start with WebSocket voice agent testing. If your agent runs through Genesys, BYOC trunks, SIP routing, or on-prem Asterisk middleware, the phone path is part of the product. Test it as one system.

Genesys and Asterisk voice agent testing is the process of validating the full enterprise call path: SIP setup, IVR routing, Asterisk channel control, media quality, AI agent behavior, transfers, recordings, transcripts, traces, and QA outcomes.

Quick filter: if a failed test cannot answer "which Genesys interaction, SIP Call-ID, Asterisk channel, transcript, recording, and trace belong to this call?", the test is not production-grade yet.

TL;DR: Build the test runbook around 5 layers:

  • Telephony: SIP trunk, BYOC, carrier, codec, DTMF, media, and failover behavior.
  • Contact center: Genesys route, IVR flow, queue, transfer, and interaction metadata.
  • Asterisk middleware: dialplan, ARI/AMI events, channel IDs, recordings, and simulator behavior.
  • Voice agent: ASR, prompt version, tool calls, TTS, interruptions, and fallback handling.
  • QA evidence: recording pointer, transcript, trace ID, evaluation result, and regression decision.

Do not release on "the call sounded fine." Release when the evidence proves the call path, agent behavior, and business outcome all matched the test.

Methodology Note: This runbook is based on Hamming's analysis of 4M+ production voice agent testing and enterprise call-review workflows across 10K+ voice agents (2025-2026). We've tested agents built on LiveKit, Pipecat, ElevenLabs, Retell, Vapi, and custom-built solutions.

It also uses public Genesys, Asterisk, and SIP trunking documentation to ground the telephony-specific evidence requirements.

Across Hamming's analysis of 4M+ production calls across 10K+ voice agents, we found that enterprise failures are rarely isolated to one layer. One artifact says the caller asked for billing. Another says the call transferred. A third says the task failed. The test has to join those facts before anyone can trust the result.

Last Updated: June 2026

Related Guides:

What Makes Genesys and Asterisk Voice Agent Testing Different?

In a Genesys or Asterisk deployment, the AI agent is only one participant in a longer telephony system. A test can pass at the model layer and still fail in production because the route, channel, recording, transfer, or media path behaved differently.

The most common mistake is testing a prompt in isolation, then assuming the same behavior will hold inside the contact-center call path.

LayerWhat Can FailEvidence You Need
SIP and BYOCINVITE failures, bad allowlists, retry/failover mismatch, codec negotiation, one-way audioSIP Call-ID, response codes, PCAP, codec, packet loss, jitter
Genesyswrong route, missing contact attributes, queue mismatch, transfer loop, bot flow branch driftinteraction ID, flow name, queue, DNIS/ANI policy, transfer reason
Asteriskwrong dialplan branch, channel not entering ARI, recording missing, DTMF mismatchchannel unique ID, Stasis events, AMI/ARI events, recording file
Voice agentASR miss, prompt regression, tool-call failure, latency, hallucination, bad fallbacktranscript, prompt version, tool trace, latency, evaluation score
QA workflowreviewer cannot replay the call or prove the failurecanonical call ID, recording pointer, trace ID, packet manifest

Genesys SIP Diagnostics documents why SIP evidence matters for BYOC Cloud and BYOC Premises: PCAPs can show signaling history for a specific call across external trunks such as BYOC Cloud Carrier, PBX trunks, and premises Edge external trunks, but PCAP generation is best-effort and the files are available only for a limited window. BYOC Premises also needs conversation headers enabled on the trunk so captures can be queried by conversation ID. These are operational constraints, not footnotes. If the test runner waits too long to collect evidence, the call may no longer be debuggable.

Asterisk ARI channel docs describe channels as the path between an endpoint and Asterisk, with a channel unique ID and events such as StasisStart and StasisEnd. Those IDs are not marketing details. They are the keys that let a test connect Asterisk behavior to Genesys and voice-agent evidence.

Enterprise voice-agent test: a call scenario that proves the caller path, contact-center route, middleware behavior, AI decision, media artifact, and final outcome agree under one canonical call record.

Which Topology Are You Testing?

Do not start with test scripts. Start with topology. The same phrase, "Genesys plus Asterisk," can mean at least 5 different systems.

TopologyAsterisk RoleGenesys RoleBest First TestWatch For
Genesys Cloud voice onlynonetelephony, routing, bot or handoffGenesys test route and interaction evidencevoice-agent layer hidden behind platform abstractions
Genesys BYOC to carrier or PBXdownstream SIP targetrouting and media edgeinbound and outbound trunk call with PCAP and route evidenceallowlist, retry codes, media region, TLS/UDP/TCP mismatch
Asterisk as test callercontrolled simulatorsystem under testdeterministic inbound calls through test DNISsimulator masking real carrier behavior
Asterisk as middleware bridgemedia/control bridgeupstream or downstream routebridge call with recording and channel correlationlost IDs, wrong channel recording, DTMF conversion
Hybrid labsimulator plus limited production mirrorisolated test routestaged happy path, fallback, transfer, and load slicetest route drifting from production route

Genesys BYOC documentation says BYOC Cloud trunks can connect Genesys Cloud to SIP-compliant carriers or devices reachable over the public internet, and supports UDP, TCP, and TLS. The same docs call out default retry and failover behavior for certain SIP codes. Your test plan needs to prove those rules match your environment, not just that one phone call connected.

For carrier-side references, Twilio Elastic SIP Trunking shows the operational shape many teams need to test: origination URIs, multiple SIP URIs for failover, edge location, recording, caller ID, firewall allowlists, and SIP OPTIONS behavior. Even if Twilio is not in your stack, the test categories are useful.

Topology rule: name the system under test in every scenario. "Asterisk test" does not tell a reviewer enough. "Asterisk-originated inbound call through Genesys BYOC route to billing voice agent" gives the test owner something they can replay.

What Scenarios Should the Test Harness Run?

Start with a small scenario matrix and make it boring before adding scale. A 200-call batch with unclear assertions creates more noise than a 12-call suite with specific artifacts.

ScenarioSetupExpected ResultRelease Blocker
Happy pathInbound caller reaches test route and completes one known taskcorrect route, transcript, tool result, recording, QA passany layer cannot be joined by call ID
DTMF pathCaller presses expected menu digits during IVR or agent promptdigit captured and routed correctlyDTMF lost, duplicated, or treated as speech only
Transfer to queueAgent transfers caller to human or queuetransfer reason and destination match policytransfer loop, wrong queue, lost caller context
Fallback pathBackend API fails or agent cannot answersafe fallback, escalation, or retry policy runsagent invents answer or hangs silently
Caller interruptionCaller corrects the agent mid-responseagent stops, accepts correction, preserves task statecaller repeats or agent continues old path
Silence timeoutCaller does not respond after promptreprompt or escalation follows policytimeout hidden as latency or ignored
No-answer or busydownstream destination cannot answerroute follows fallback policycall drops without useful reason
QoS degradationcontrolled jitter, packet loss, or codec mismatch in labtest records quality degradation and voice-agent impacttranscript only shows failure with no media evidence
Recording and replaycall should be reviewable after testrecording pointer, transcript, trace, and score open togetherrecording missing, wrong channel, or unsafe access

The Asterisk side should be explicit about whether it is generating calls, controlling channels, or only recording media. Asterisk recording docs distinguish live recordings from stored recordings. MixMonitor can record mixed or separated audio streams, but your QA result should know which channel was recorded.

Here is the minimum test case shape:

{
  "testId": "genesys_billing_dtmf_transfer_001",
  "topology": "asterisk-originated-call-to-genesys-byoc",
  "callerFixture": "billing_customer_known_account",
  "entryRoute": {
    "dialedNumber": "+15551230000",
    "genesysFlow": "billing-support-test",
    "asteriskContext": "hamming_test_calls"
  },
  "expectedPath": [
    "call.answered",
    "ivr.prompt_played",
    "dtmf.received",
    "agent.session_started",
    "tool.called",
    "call.transferred"
  ],
  "assertions": {
    "dtmfDigit": "2",
    "targetQueue": "billing_escalation_test",
    "taskOutcome": "transferred_with_context",
    "maxP95TurnLatencyMs": 3000
  }
}

The important part is not the particular field names. The important part is that the test names the route, caller fixture, expected events, and evidence required to prove the result.

What Evidence Should Every Test Capture?

Every test needs an evidence contract. Without it, QA ends up with a call recording, engineering ends up with logs, telephony ends up with PCAP, and nobody can prove they are looking at the same call.

EvidenceRequired?SourceWhy It Matters
canonical call IDyesyour test runner or orchestration layerjoins every artifact under one identity
Genesys interaction or conversation IDyesGenesysproves route, flow, queue, and transfer behavior
SIP Call-ID and response codesyes for SIP/BYOC testsSIP trace or PCAPproves setup, retry, failover, and disconnect path
Asterisk channel unique IDyes for Asterisk testsARI/AMI/channel logsjoins dialplan, bridge, recording, and channel events
recording pointerusuallyGenesys, Asterisk, carrier, or storagelets reviewers hear DTMF, silence, clipping, and interruptions
transcript and turn IDsyesvoice-agent runtime or ASRproves what the agent heard and said
prompt and agent versionyesvoice-agent runtimeties behavior to shipped configuration
trace IDyes for engineering QAOpenTelemetry or app tracingjoins ASR, LLM, tools, TTS, and storage spans
evaluation resultyesHamming or QA systemgives pass/fail reason and regression label
redaction stateyesevidence/export layerprotects recordings, transcripts, and account data

For the full identity model, use the IVR and voice agent log correlation runbook. For reviewer packaging, use the call evidence export runbook.

Evidence contract: a test is not complete until it can produce the IDs, recordings, transcripts, traces, and QA result needed to replay the call without a manual hunt through every production system.

Use a normalized event envelope so QA and engineering can search across providers:

{
  "canonicalCallId": "call_2026_06_17_001",
  "eventName": "telephony.transfer.completed",
  "occurredAt": "2026-06-17T15:42:18.231Z",
  "sourceSystem": "genesys",
  "providerAliases": {
    "genesysInteractionId": "interaction_test_001",
    "sipCallId": "a84b4c76e66710@sample.org",
    "asteriskChannelId": "asterisk01-1781710938.12",
    "traceId": "4bf92f3577b34da6a3ce929d0e0e4736"
  },
  "payload": {
    "fromRoute": "billing_voice_agent_test",
    "toQueue": "billing_escalation_test",
    "transferReason": "caller_requested_human",
    "recordingPointer": "s3://qa-evidence/call_001/audio.wav"
  },
  "qa": {
    "evaluationId": "eval_001",
    "result": "pass",
    "regressionCandidate": false
  }
}

Keep the payload small enough to search and safe enough to share. Put raw audio, full transcripts, and account-specific data behind the right access controls.

How Do You Automate Asterisk and Genesys Test Runs?

Automation should make the test repeatable, not hide the call path. Treat Asterisk as a controlled participant and Genesys as the routing system under test.

StepActionOutputCommon Mistake
1. Freeze topologyname trunk, route, test number, Asterisk context, and agent versiontopology record"testing Genesys" without a concrete route
2. Generate calloriginate a controlled call or replay a known fixturecall attempt IDusing production caller identity by accident
3. Drive scenarioplay audio, send DTMF, pause, interrupt, transfer, or hang upevent sequenceonly testing one happy path
4. Capture artifactspull Genesys, SIP, Asterisk, transcript, recording, and trace evidenceevidence packetcollecting artifacts after retention window
5. Score behaviorevaluate route, transcript, latency, tool result, and outcomeQA resultpass/fail based only on transcript
6. Promote failuresturn repeatable failures into regression teststest fixtureleaving incident evidence as a one-off note

Asterisk can be a useful simulator because it can originate calls, expose channel events, manipulate media, and record audio. That does not make it a complete substitute for real carrier traffic. We used to think a lab route was enough if the agent logic passed. After watching voice-agent failures move across telephony, IVR, and model layers, I now treat lab calls as the first gate, not the last gate.

For enterprise launches, use 3 rings of confidence:

  1. Transport gate: WebSocket or direct endpoint test proves the voice-agent runtime can accept audio and emit events.
  2. Telephony lab gate: Asterisk or SIP simulator proves the Genesys route, IVR, DTMF, transfer, recording, and agent behavior.
  3. Production mirror gate: limited test numbers or controlled cohorts prove the same evidence contract survives real routing and media constraints.

The voice agent workflow testing runbook covers business assertions. This runbook adds the enterprise telephony boundary: the workflow is not proven unless the route and evidence are proven too.

What Release Gates Should Block Launch?

Block launch on missing evidence, not just bad outcomes. A test that fails with good evidence is fixable. A test that passes without replayable evidence is fragile.

GatePass ConditionBlock Launch When
Route gatetest DNIS/ANI, flow, queue, and transfer match expected pathwrong queue, hidden fallback, or transfer loop
SIP gatesetup, retry/failover, and disconnect reason are visibleSIP Call-ID, response code, or PCAP evidence missing for a SIP failure
Media gateaudio is bidirectional and recording matches policyone-way audio, wrong channel, clipping, or missing recording
DTMF gateexpected digits are captured and routed correctlyDTMF lost, duplicated, delayed, or interpreted as speech only
Agent gatetranscript, prompt version, tool calls, and TTS are traceablemodel response cannot be connected to call evidence
QA gateevaluation result names the failing assertionpass/fail has no reason, rubric, or artifact pointer
Privacy gatetranscript and audio redaction state is explicitraw call artifacts leak into broad review paths
Regression gaterepeatable failures become fixturesproduction failure remains a screenshot or anecdote

This is where production reliability testing and incident response connect. The release gate should tell the on-call engineer what evidence will exist when the same failure appears at 3am.

One unresolved tension: production-like telephony tests cost more than endpoint tests. They use real routes, numbers, SIP infrastructure, recordings, and sometimes carrier minutes. Do not run them for every tiny prompt edit. Use fast endpoint tests first, then run the enterprise telephony suite for changes that touch routing, caller identity, DTMF, transfers, recordings, latency, or regulated workflows.

How Hamming Fits Into the Enterprise Telephony QA Loop

Hamming does not need to be your Genesys administrator or your Asterisk PBX. Hamming fits as the independent QA and evidence layer around the voice-agent behavior.

Use Hamming to:

  • Generate or replay realistic caller scenarios before a route goes live.
  • Evaluate transcripts, audio, latency, tool calls, and policy adherence after each test call.
  • Attach QA results to call IDs, recordings, traces, and workflow evidence.
  • Find the few production calls worth reviewing instead of sampling blindly.
  • Promote failed Genesys, SIP, or Asterisk call patterns into repeatable regression tests.
  • Compare agent versions after prompt, model, ASR, TTS, route, or middleware changes.

The practical loop is:

define topology
run controlled call
capture telephony + contact-center + agent evidence
score the call
review failures
promote repeatable failures into regression tests
rerun before the next route, prompt, or provider change

If you already have a Genesys or Asterisk stack, keep it. The goal is not to replace your telephony system. The goal is to make sure your AI voice agent is tested with the same call path, evidence, and failure modes that real callers use.

Launch Checklist

Before shipping a Genesys or Asterisk voice agent path, verify:

  • The topology under test is named and versioned.
  • Test numbers, routes, queues, and Asterisk contexts are isolated from production callers.
  • Every test creates one canonical call ID.
  • Genesys interaction IDs, SIP Call-IDs, Asterisk channel IDs, recordings, transcripts, and traces are joined.
  • DTMF, transfer, silence, interruption, fallback, and no-answer scenarios are covered.
  • Recordings identify channel policy: mixed, caller-only, agent-only, or dual channel.
  • The QA result names the failing assertion and links to evidence.
  • Raw audio and transcripts follow the retention and redaction policy.
  • At least 1 production-mirror call validates the route before launch.
  • Repeatable failures become regression tests before the next release.

Do the boring part first. Make the call path replayable. Once the evidence is joined, the AI failures become much easier to fix.

Frequently Asked Questions

Test the complete call path: SIP setup, Genesys route, Asterisk channel behavior, voice-agent transcript, recording, trace, and QA result. Hamming recommends starting with 8-12 deterministic scenarios before running higher-volume production-mirror tests.

Capture the Asterisk channel unique ID, dialplan context, ARI or AMI events, recording pointer, DTMF events, bridge or transfer events, and the canonical call ID used by the QA system. Hamming's evidence contract treats these as join keys, not optional debug logs.

Capture the Genesys interaction or conversation ID, flow name, queue, transfer reason, contact attributes, and SIP diagnostic evidence when BYOC or SIP routing is involved. According to Hamming's runbook, those artifacts should be joined to the transcript, recording, trace ID, and evaluation result for every launch-blocking test.

No. A chatbot-style test can prove the model answered a prompt, but it does not prove DTMF, SIP setup, media quality, IVR routing, transfers, recordings, or trace correlation. Hamming recommends using endpoint tests as the first gate and telephony-path tests as the launch gate for Genesys or Asterisk deployments.

Run controlled calls that send expected digits, trigger transfer conditions, simulate no-answer or busy destinations, and force backend failures. Each test should assert the expected Genesys route, Asterisk event, voice-agent decision, recording, transcript, and QA outcome under one canonical call ID.

Block launch when route, SIP, media, DTMF, transfer, recording, trace, QA, privacy, or regression evidence is missing or contradictory. Hamming's checklist treats missing replay evidence as a release blocker even if one manual test call sounded correct.

Hamming fits as the independent QA and evidence layer around the voice-agent behavior: it evaluates calls, links failures to transcripts and traces, and turns repeatable production failures into regression tests. It does not need to replace Genesys or Asterisk to make the enterprise telephony path testable.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizento build the future of trustworthy, safe and reliable voice AI agents.”