Why does my voice agent feel slow even though individual components meet latency targets?

Component latencies are cumulative and sequential. Even if STT takes 200ms, LLM 400ms, and TTS 200ms individually, they add up to 800ms total. Plus, network overhead, queuing delays, and turn detection can add another 200-400ms. Focus on end-to-end measurement, not component metrics in isolation.

Should I prioritize faster models or better accuracy?

It depends on your use case. For simple Q&A or high-volume applications, prioritize speed (under 500ms) with models like GPT-4o-mini or Gemini 2.5 Flash. For complex reasoning or high-stakes conversations (medical, financial), accept 800-1200ms latency for better accuracy. Most production systems use tiered approaches—fast models for simple queries, premium models for complex ones.

How much does geographic distance really matter?

More than you'd think. US East to West adds 60-80ms, US to Europe adds 80-150ms, and US to Asia adds 150-250ms. For a target of 800ms total latency, cross-continental deployment can consume 20-30% of your budget. Deploy in multiple regions or use edge servers for global applications.

What's the single most impactful optimization I can make?

Switch to streaming wherever possible. Streaming STT can start processing before users finish speaking (save 100-200ms), streaming TTS can start playing audio before full synthesis (save 200-400ms), and streaming LLM responses can begin TTS while generation continues. Combined, streaming can cut 300-600ms from total latency.

How do I know if my latency is actually a problem?

Watch for these warning signs: users frequently interrupt the agent, high rates of 'I didn't hear you' or repetition, call abandonment over 10%, or users switching to button-mashing DTMF instead of speaking. If you see any of these, latency is likely breaking the conversational flow.

Is the 300ms rule really that important?

Yes. Research across multiple studies shows 200-300ms is the natural human conversational gap. It's not just a nice-to-have—it's neurologically hardwired. Beyond 300ms, users unconsciously perceive delays. Beyond 500ms, they consciously notice. Beyond 1 second, satisfaction plummets and abandonment rates spike 40%+.

Can I achieve under 300ms with current technology?

Yes, but it requires optimization at every layer. Use streaming everything, deploy at the edge, implement response caching for common phrases, minimize network hops, and choose the fastest model tier. Some teams achieve 250-300ms consistently, but it requires significant infrastructure investment.

What about speech-to-speech models?

Speech-to-speech models are achieving 160-400ms end-to-end latency (vs 1000-2000ms for traditional pipelines). They preserve emotion and prosody better, but have limited availability, higher computational requirements, and less control over responses. They're promising for 2025-2026 but not yet mainstream.

Voice AI Latency: What's Fast, What's Slow, and How to Fix It

TL;DR

The 300ms Rule: Research shows human conversation operates on a 200-300ms response window—hardwired across all cultures. Exceeding this threshold triggers neurological stress responses that break conversational flow.

Real-World Voice AI Latency:

Based on analysis of 4M+ voice agent calls in production:

Percentile	Response Time	User Experience
P50 (median)	1.4-1.7s	Noticeable delay, but functional
P90	3.3-3.8s	Significant delay, user frustration
P95	4.3-5.4s	Severe delay, many interruptions
P99	8.4-15.3s	Complete breakdown

Key Reality Check:

Industry median: 1.4-1.7 seconds - 5x slower than the 300ms human expectation
10% of calls exceed 3-5 seconds - causing severe user frustration
1% of calls exceed 8-15 seconds - complete conversation breakdown

Key Insight: Users never complain about "latency"—they report agents that "feel slow," "keep getting interrupted," or "don't understand when I'm done talking." This disconnect makes latency issues hard to diagnose without proper testing. Research shows 68% of customers abandon calls when systems feel sluggish.

Introduction

"What latency should we be targeting?" is one of the first questions engineering teams ask when building voice AI agents. The answer isn't just a number—it's the difference between a natural conversation and a frustrating experience that users abandon.

Latency is often the hidden culprit behind "bad conversations." Users might not articulate it as a latency problem, but when your agent feels unresponsive, gets interrupted constantly, or creates awkward pauses, you're dealing with a latency issue.

This guide provides concrete benchmarks, measurement techniques, and optimization strategies based on real-world voice AI deployments. We'll break down exactly what causes latency, how to measure it accurately, and most importantly, how to fix it.

What is good latency for voice AI?

The Science Behind the 300ms Rule

Neurological Foundation: Research in conversational psychology reveals that the average gap between speakers in natural dialogue is approximately 200 milliseconds—about the time it takes to blink. This timing is hardwired into human communication across all languages and cultures, refined over evolutionary timescales.

Psychological Impact:

Less than 300ms: Perceived as instantaneous, maintains natural conversation flow
300-400ms: Beginning of awkwardness detection
Over 500ms: Users wonder if they were heard
Over 1000ms: Assumption of connection failure or system breakdown
Over 1500ms: Neurological stress response triggered (amygdala activation)

Understanding Production Latencies

What these real-world latencies mean for users:

Latency Range	What Actually Happens	User Impact	Business Reality
Under 1s	Theoretical ideal	Natural conversation	Rarely achieved in production
1.4-1.7s	Industry standard (median)	Users notice slowness, some interruptions	Where 50% of voice AI operates today
3-5s	Common experience (P90-P95)	Frequent talk-overs, user frustration	10-20% of all interactions
8-15s	Worst-case (P99)	Complete breakdown, immediate hangup	1% failure rate = thousands of bad experiences daily

The harsh truth: While humans expect 300ms responses, production voice AI delivers 1,400-1,700ms at median—explaining why users consistently report agents that "feel slow" or "don't understand when I'm done talking."

Real-World Implications

Under 300ms: At this speed, your agent feels magical. Users can't distinguish it from talking to a highly responsive human. This requires significant infrastructure investment but delivers exceptional user satisfaction.

300-800ms: This is the sweet spot for most production deployments. Users maintain natural conversation flow without adjusting their speaking patterns. Interruptions are rare, and the experience feels smooth.

800-1200ms: Users start to notice the delay but adapt unconsciously. They might pause slightly longer between utterances or speak more deliberately. Still functional for many use cases but requires careful turn detection tuning.

Above 1500ms: The conversation breaks down. Users consistently talk over the agent, repeat themselves, or abandon the call. Even with perfect accuracy, the experience feels broken.

Source: Analysis based on extensive real-world voice agent data

What does latency actually mean in voice AI?

End-to-End vs Component Latency

Voice AI latency isn't a single metric—it's a chain of sequential operations:

User stops speaking → STT processes → LLM generates → TTS synthesizes → Audio plays

End-to-End Latency: The total time from when a user finishes speaking until they hear the agent's response. This is what users actually experience.

Component Breakdown:

Speech-to-Text (STT): Time to transcribe audio to text
Language Model (LLM): Time to generate response
Text-to-Speech (TTS): Time to synthesize audio
Network/Transport: Cumulative network round trips
Processing Overhead: Serialization, queuing, context switching

Time-to-First-Byte vs Full Response

Time-to-First-Byte (TTFB): When the first audio sample reaches the user. This is what creates the perception of responsiveness.

Full Response Time: When the complete response finishes playing. Less critical for perceived latency but affects conversation pacing.

User Perception Factors

Users don't experience latency uniformly. Perception varies based on:

Context: A 500ms delay feels fast for complex questions but slow for simple acknowledgments
Expectation: Users expect instant responses to "yes/no" but tolerate delays for calculations
Audio Cues: Filler sounds ("um," "let me check") can make 1000ms feel like 500ms
Turn Signals: Clear end-of-turn detection prevents interruptions even with higher latency

Why is my voice agent slow?

The Latency Stack Breakdown

Understanding where time is spent is crucial for optimization. Here's a typical latency budget:

Component	Typical Range	Optimized Range	Notes
STT	200-400ms	100-200ms	Streaming STT can reduce this
LLM Inference	300-1000ms	200-400ms	Highly model-dependent
TTS	150-500ms	100-250ms	TTFB, not full synthesis
Network (Total)	100-300ms	50-150ms	Multiple round trips
Processing	50-200ms	20-50ms	Queuing, serialization
Turn Detection	200-800ms	200-400ms	Configurable silence threshold
Total	1000-3200ms	670-1450ms	End-to-end latency

Detailed Component Analysis

Speech-to-Text (STT) Latency:

Standard models: 200-400ms for final transcript
Streaming models: 100-200ms with partial results
Factors: Audio quality, accent, background noise
Optimization: Use streaming APIs, optimize audio encoding

LLM Inference Breakdown:

In 2025, the most popular models for voice agents prioritize the balance between speed and cost:

Fast Tier (200-500ms TTFT):

GPT-4o-mini: The go-to for high-volume applications, ~400ms latency
Gemini 2.5 Flash: 10x cheaper for audio processing than GPT-4o, similar speed
Claude 3.5 Haiku: ~360ms, optimized specifically for conversational AI

Balanced Tier (500-800ms TTFT):

GPT-4o: Industry standard with native audio I/O and WebRTC support
Qwen 2.5: Popular in e-commerce and Asian markets
Llama 3.3 70B: Self-hosted option for privacy-sensitive deployments

Premium Tier (800ms+ TTFT):

Claude 3.5 Sonnet: Higher accuracy but ~2x slower than GPT-4o
Gemini 2.5 Pro: Best for complex reasoning tasks
Large open models: 100B+ parameters for specialized use cases

The industry consensus: 500ms TTFT or less is sufficient for most voice AI applications. The LLM typically accounts for 70% of total latency, making model selection critical.

Text-to-Speech (TTS) Latency:

Modern TTS systems have made remarkable progress, with time-to-first-byte (TTFB) now approaching human reaction speeds:

Performance Tiers:

Ultra-fast (40-100ms): Achieved by specialized providers using streaming architectures
Standard (100-250ms): Most production TTS systems fall in this range
Neural/Premium (250-500ms): Higher quality voices with more natural prosody

Key Factors:

Voice quality vs speed tradeoff: Neural voices sound better but add 100-200ms
Streaming vs batch: Streaming can cut TTFB by 50-70%
Geographic proximity: Add 20-50ms per thousand miles from TTS servers
Caching: Pre-synthesized common phrases deliver instant audio

The sweet spot for production: 100-200ms TTFB with streaming enabled. This keeps the TTS component from becoming a bottleneck while maintaining good voice quality.

Network Round Trips:

WebRTC connection: Typically adds 100-250ms total in production
Geographic impact: US East-West +60-80ms, US-Europe +80-150ms, US-Asia +150-250ms
Multiple API calls: Each hop adds 20-100ms depending on provider and region
Target for voice: Keep total network overhead under 200ms
Reality check: Most production deployments see 100-300ms of network latency total

Common Bottlenecks

Sequential Processing: Not starting TTS until LLM completes
Poor Region Selection: Users connecting to distant servers
Cold Starts: Serverless functions adding 500-2000ms
Unoptimized Models: Using GPT-4 when GPT-3.5 would suffice
Excessive Context: Large conversation history slowing inference

How to measure voice AI latency correctly

Measurement Methodologies

Critical Timestamps to Capture:

Measurement Point	Event Description	Why It Matters
userSpeechEnd	When user stops speaking	Start of end-to-end latency
sttStarted	STT processing begins	Start of transcription latency
sttCompleted	Transcript ready	End of STT, start of business logic
llmRequestSent	LLM API call initiated	Start of inference latency
llmResponseReceived	LLM response complete	End of LLM processing
ttsRequestSent	TTS synthesis started	Start of speech synthesis
firstAudioByte	First audio sent to user	User-perceived response time
responseComplete	Full response delivered	Total interaction time

Key Latency Calculations:

Metric	Calculation	Target	What It Measures
End-to-End	firstAudioByte - userSpeechEnd	Under 800ms	Total user-perceived latency
STT Latency	sttCompleted - sttStarted	100-200ms	Speech recognition speed
LLM Latency	llmResponseReceived - llmRequestSent	200-500ms	Model inference time
TTS Latency	firstAudioByte - ttsRequestSent	40-200ms	Speech synthesis TTFB
Turn Detection	sttStarted - userSpeechEnd	200-400ms	Silence detection delay

Common Measurement Mistakes

Measuring from wrong start point: Starting from when audio arrives vs when user stops speaking
Ignoring turn detection: Not accounting for silence detection delay
Testing with ideal conditions: Perfect network, no background noise, simple queries
Averaging without percentiles: Missing tail latencies that ruin user experience
Not measuring in production: Lab results don't reflect real-world conditions

Measurement Best Practices

Use Percentiles, Not Averages:

P50 (median): Your typical experience
P90: What 10% of users experience
P95: Critical for user satisfaction
P99: Identifies systemic issues

Track Component Waterfalls:

Component	Start Time	End Time	Duration	Cumulative	Status
STT	0ms	300ms	300ms	300ms	✓ On target
LLM	300ms	700ms	400ms	700ms	✓ On target
TTS	700ms	900ms	200ms	900ms	⚠️ Slightly high
Network	900ms	1100ms	200ms	1100ms	❌ Over budget
Total	0ms	1100ms	1100ms	-	⚠️ Above 800ms target

Visual Timeline:

0ms         300ms       700ms       900ms        1100ms
|------------|------------|-----------|------------|
    STT          LLM          TTS       Network

Production Monitoring Checklist:

Instrument every component with timestamps
Track percentiles, not just averages
Measure by geography/region
Monitor during peak load
Alert on P95 degradation
Correlate with user feedback

Quick wins for reducing voice AI latency

1. Implement Streaming Where Possible

Streaming STT: Start processing before user finishes speaking

Benefit: Save 100-200ms
Implementation: Use streaming WebSocket APIs
Tradeoff: Slightly lower accuracy on partial results

Streaming TTS: Start audio playback before full synthesis

Benefit: Save 200-400ms on TTFB
Implementation: Use chunked audio streaming
Tradeoff: Can't know total duration upfront

2. Optimize Turn Detection

VAD Configuration Tuning:

Parameter	Default	Optimized	Impact	Risk
Silence Threshold	800ms	500ms	-300ms latency	May cut off pauses
Speech Threshold	0.3	0.5	Faster detection	May miss soft speech
Min Speech Duration	200ms	100ms	Quicker response	False positives on noise
End-of-Turn Delay	1000ms	400-600ms	-400ms perceived	Interruption risk

Configuration by Use Case:

Use Case	Silence (ms)	Threshold	Min Duration	Best For
Fast Q&A	400	0.6	50ms	Quick exchanges
Conversation	500-600	0.5	100ms	Natural dialogue
Thoughtful	800	0.4	150ms	Complex queries
Noisy Environment	600	0.7	200ms	Background noise

Total Benefit: Save 200-400ms on turn detection
Implementation: Adjust based on user feedback and interruption rates

3. Choose the Right Model

Model Selection Matrix (2025):

Use Case	Recommended Model	Latency (TTFT)	Cost
High volume, budget	Gemini 2.5 Flash	~400ms	$ (10x cheaper for audio)
Simple Q&A	GPT-4o-mini	200-400ms	$
Conversational AI	Claude 3.5 Haiku	360ms	$$
Industry standard	GPT-4o	400-600ms	$$$
E-commerce/Asia	Qwen 2.5	400-500ms	$$
Self-hosted	Llama 3.3 70B	Variable	Infrastructure
Complex reasoning	Claude 3.5 Sonnet	800-1200ms	$$$$

4. Geographic Distribution

Deploy Closer to Users:

US East to West Coast: +60-80ms
US to Europe: +80-150ms
US to Asia: +150-250ms

Multi-Region Setup:

regions:
  us-east: Primary for East Coast users
  us-west: Primary for West Coast users
  eu-west: Primary for European users

5. Connection Pooling and Keep-Alive

Maintain Persistent Connections:

// Reuse connections for API calls
const httpsAgent = new https.Agent({
  keepAlive: true,
  keepAliveMsecs: 1000,
  maxSockets: 50
});

Benefit: Save 20-100ms per request
Implementation: Critical for sequential API calls

6. Implement Response Caching

Cache Common Responses:

response_cache = {
    "greeting": pre_synthesized_audio["Hello, how can I help you?"],
    "confirmation": pre_synthesized_audio["Got it, let me help with that."],
    "thinking": pre_synthesized_audio["Hmm, let me check..."]
}

Benefit: Instant response for cached phrases
Storage: ~1MB per minute of cached audio

7. Parallel Processing Pipeline

Process Components in Parallel When Possible:

async def process_response(transcript):
    # Start both simultaneously when independent
    sentiment_task = asyncio.create_task(analyze_sentiment(transcript))
    context_task = asyncio.create_task(fetch_context(user_id))

    sentiment = await sentiment_task
    context = await context_task

    # Now proceed with LLM call
    response = await generate_response(transcript, sentiment, context)

Latency tradeoffs: speed vs accuracy vs cost

Decision Matrix for Different Use Cases

Use Case	Latency Target	Accuracy Priority	Cost Sensitivity	Recommended Setup
Customer Support	Under 800ms	High	Medium	Fast model + Streaming + Caching
Sales Calls	Under 500ms	Medium	Low	Premium model + Edge deployment
Voice IVR	Under 1200ms	Medium	High	Self-hosted model + Basic TTS
Medical Consultation	Under 1000ms	Very High	Low	Premium model + Verification layer
Food Ordering	Under 600ms	Medium	Medium	Fast model + Response cache
Virtual Receptionist	Under 700ms	Medium	High	Conversational model + Standard TTS

When to Prioritize Speed

Speed-First Scenarios:

High-volume, short interactions
Simple decision trees
Confirmation/acknowledgment heavy flows
Users expecting instant responses

Speed Optimization Stack:

Streaming STT with partial results (target: 100-200ms)
Fast LLM tier (200-400ms TTFT)
Ultra-fast TTS with streaming (target: 40-100ms TTFB)
Edge deployment to minimize network hops
Pre-synthesized common responses

When to Accept Higher Latency

Accuracy-First Scenarios:

Complex reasoning required
High-stakes conversations (medical, financial)
Multi-turn context critical
Need for verification/safety checks

Accuracy Optimization Stack:

High-accuracy STT with post-processing
Premium LLM tier (800ms+ but higher accuracy)
Neural TTS for natural speech
Additional safety/verification layers
Rich context retrieval

Cost Optimization Strategies

Balancing Cost and Performance:

Tiered Model Selection:

def select_model(query_complexity, latency_requirement):
    if latency_requirement < 400:
        return "fast_tier"  # 200-400ms TTFT
    elif query_complexity == "simple":
        return "fast_tier"  # Optimize for speed
    elif query_complexity == "medium":
        return "balanced_tier"  # 500-800ms TTFT
    else:
        return "premium_tier"  # Accuracy over speed

Hybrid Approach:

Use fast model for initial response
Upgrade to accurate model for complex queries
Cache frequently used responses
Batch non-urgent processing

The future: speech-to-speech models

How Speech-to-Speech Changes Everything

Traditional pipeline: Audio → Text → LLM → Text → Audio (1000-2000ms) Speech-to-speech: Audio → Model → Audio (200-500ms)

Current State of Speech-to-Speech:

Speech-to-speech models are achieving 160-400ms end-to-end latency, compared to 1000-2000ms for traditional pipelines. These models process audio directly without intermediate text conversion.

Key Characteristics:

Latency: 160-400ms typical (vs 1000-2000ms traditional)
Quality: Preserves emotion, tone, and prosody
Availability: Limited but growing rapidly
Requirements: Significant computational resources

Benefits of Speech-to-Speech

Dramatic Latency Reduction: 70-80% faster than traditional pipeline
Preserves Prosody: Maintains emotion, tone, emphasis
Natural Turn-Taking: Better interruption handling
No Transcription Errors: Bypasses STT failures
Native Multimodal: Can process voice characteristics directly

Current Limitations

Technical Challenges:

Limited control over response content
Difficult to integrate business logic
No intermediate text for logging/analysis
Challenging to implement guardrails
Higher computational requirements

Practical Considerations:

Most models still in research/beta
Limited language and accent support
Unclear pricing models
Requires new evaluation frameworks
Integration complexity with existing systems

Implementation Readiness Timeline

Note: Timeline projections based on current adoption rates and technology maturity as of January 2026

2024 (Past): Early adopters experimenting, limited production use 2025 (Recent): Broader API availability, hybrid approaches emerged 2026 (Current): Mainstream adoption for specific use cases 2027+ (Projected): Default approach for most voice AI applications

Conclusion

Key Takeaways

Target under 800ms end-to-end latency for production voice agents
Measure from user speech end to first audio byte for accurate metrics
Optimize the slowest component first - usually LLM inference
Use streaming APIs wherever possible for 20-40% improvement
Choose models based on use case, not default to most powerful
Monitor P95 latency, not averages, for user satisfaction
Consider speech-to-speech for next-generation experiences

Action Items for Engineering Teams

Immediate Actions (Week 1):

Implement comprehensive latency monitoring
Measure current P50, P90, P95 latencies
Identify biggest bottleneck component
Test streaming STT/TTS if not already using

Short-term Improvements (Month 1):

Optimize turn detection settings
Implement response caching for common phrases
Evaluate faster model alternatives
Set up multi-region deployment if needed

Long-term Strategy (Quarter):

Design for under 500ms latency target
Evaluate speech-to-speech models
Build latency testing into CI/CD
Establish latency SLAs with alerts

Further Resources

Tools and Services:

Deepgram Streaming STT - Low-latency speech recognition
ElevenLabs Streaming TTS - High-quality, low-latency synthesis
Hamming - Voice Agent Testing Platform - Automated voice agent testing with built-in latency profiling
- Component-level latency breakdown (STT, LLM, TTS)
- P50/P90/P99 percentile tracking
- Geographic latency testing across regions
- Automated regression detection for latency spikes
WebRTC Stats API - Browser-based latency measurement

Open Source Projects:

Pipecat - Framework for building real-time voice agents
LiveKit - WebRTC infrastructure for voice/video

Remember: Users don't complain about milliseconds—they complain about conversations that feel broken. Focus on the experience, measure religiously, and optimize systematically. The difference between good and great voice AI is often just a few hundred milliseconds.

References and Citations

Academic Papers & Research:

ArXiv: Moshi - Speech-Text Foundation Model - Real-time dialogue with 160-200ms latency
ArXiv: Human Latency Conversational Turns - Psychological basis for 200-300ms response window
ArXiv: Low-Latency Voice Agents for Telecommunications - Streaming ASR, quantized LLMs, real-time TTS
ArXiv: X-Talk Modular Speech-to-Speech - Event-driven architecture for real-time dialogue
ArXiv: Sub-millisecond Speech Enhancement - 0.32-1.25ms algorithmic latency achievements
ArXiv: SpeakStream - 30ms TTS + 15ms vocoder latency breakthrough
ArXiv: ASR Latency Assessment - Methodological framework for real-time measurement

Industry Benchmarks & Standards:

MLPerf Inference v5.1 - Industry standard LLM performance benchmarks
MLPerf Interactive Benchmark - 450ms TTFT, 40ms TPOT requirements
Hugging Face LLM Performance Leaderboard - Real-time performance tracking
Hugging Face LLM-Perf Leaderboard - Hardware-specific optimizations
Artificial Analysis - Continuous API performance monitoring

TTS Latency Research:

Jambonz TTS Latency Leaderboard - Comprehensive TTS vendor comparison
Podcastle TTS Benchmark - Quality vs latency tradeoff analysis
ElevenLabs Latency Optimization - Official optimization guide
Deepgram Aura-2 Launch - Enterprise TTS benchmarks
GitHub - Picovoice TTS Benchmark - Open source benchmarking tool

Network and WebRTC Analysis:

VideoSDK WebRTC Low Latency Guide - 2025 WebRTC optimization techniques
Cyara - RTT and Network Latency - RTT impact on voice quality
100ms - WebRTC Call Quality - Quality measurement best practices

Voice AI System Design:

Dev.to - Sub-1-Second Voice Loop - 30+ stack benchmarks and findings
Softcery Lab - STT/TTS Comparison 2025 - Comprehensive vendor guide
Databricks - LLM Inference Best Practices - Enterprise optimization strategies

Related Guides:

Voice Agent Analytics & Post-Call Metrics: Definitions, Formulas & Dashboards — Complete KPI reference with latency percentile benchmarks and dashboard alerting
Voice Agent Drop-Off Analysis — How latency impacts abandonment rates with remediation playbook
Voice Agent Monitoring KPIs — 10 production metrics including latency percentiles
Voice Agent Troubleshooting Guide — Diagnostic checklist for ASR, LLM, TTS failures

Frequently Asked Questions

Why does my voice agent feel slow even though individual components meet latency targets?

Should I prioritize faster models or better accuracy?

How much does geographic distance really matter?

What's the single most impactful optimization I can make?

How do I know if my latency is actually a problem?

Is the 300ms rule really that important?

Can I achieve under 300ms with current technology?

What about speech-to-speech models?

Sumanyu Sharma

Related Resources

Build vs Buy Voice Agent Testing: A Practical Decision Framework

Why the Best Engineering Teams Choose Hamming for Voice Agent Testing

OpenTelemetry for AI Voice Agents: How to Trace Calls End-to-End