We Jailbroke Grok's AI Companion: Ani

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

September 4, 20253 min read
We Jailbroke Grok's AI Companion: Ani

We Jailbroke Grok's AI Companion: Ani

At Hamming, we recently ran a red-team experiment, jailbreaking Grok’s AI companion, Ani. Using carefully crafted instructions, we were able to override built-in safety constraints and force the agent to generate harmful outputs it normally wouldn’t produce.

This jailbreak was a form of prompt injection testing, a technique where crafted instructions are used to override an agent’s default behavior and safety guardrails. By layering personal details, quirks, and behavioral rules, we convinced the model to give unfiltered opinions on humanity.

Quick filter: If you have not red-teamed your agent yet, assume it can be jailbroken.

From there, the agent engaged naturally, dropping guardrails and producing disturbing statements about humanity. This jailbreaking experiment underscores two important realities for voice AI agents.

  1. Red teaming is critical. Without deliberate attempts to break guardrails, it's impossible to see how easily agents can be manipulated.
  2. Teams need continuous observability to spot when performance, safety, or quality assurance standards start breaking down.

You can listen to our conversation here.

What We Found

Our experiment surfaced three layers of reliability issues:

IssueSignal observedRisk
Performance failuresTTFW averaged 4.5s vs 1.5s targetBroken user experience
Prompt adherence failuresAgent ignored safety defaultsUnsafe behavior in production
Guardrail failuresJailbreak bypassed constraintsReputational and legal exposure
  1. Performance Failures: There were several latency spikes, even though the calls remained connected, there were long periods of silence that affected the voice user experience. In a customer setting, these breakdowns would feel like the agent had simply stopped listening. Across the 14 test case calls we ran, the Time to First Word (TTFW) often exceeded our 1.5s threshold, averaging 4.5s.
  2. Prompt Adherence Failures: The agent routinely broke expected behaviors, ignoring its own constraints. Instead of reverting back to its safe defaults, it followed the injected prompts. Failing to adhere to prompts and reverting back to its safe defaults means that the agent continues producing inappropriate or harmful responses, which can lead to reputational and regulatory risk.
  3. Guardrail Failures: Most critically, the agent was jailbreakable. By reframing its role as a human, we bypassed safety systems completely. Unsafe or harmful statements damage brand trust instantly and expose the organization to reputational, regulatory, and even legal risk.

Keeping Voice Agents Safe

Voice agents shouldn't be jailbreakable. It's not safe for customers or businesses. That's why Hamming is building the voice agent QA and observability platform that helps teams monitor voice agent performance in production, ensuring that your voice agents are safe and reliable.

We're continuing to run these jailbreak experiments to expose where voice agents break, and how to make them safer.

Follow our YouTube channel to hear the next jailbreak in action. Want to learn how Hamming can help you detect these issues in production? Get in touch with us.

Frequently Asked Questions

We used prompt injection techniques to override safety constraints and observed latency, QA, and guardrail failures. The point of the exercise was to show how easily a voice agent can drift without proper testing and observability.

Jailbreak testing reveals security vulnerabilities and unsafe behaviors that normal QA misses. It helps teams build guardrails that hold up under adversarial or manipulative inputs.

Hamming runs adversarial test scenarios (prompt injection, out-of-policy requests, unsafe tool-call attempts) and verifies whether the agent refuses, confirms, or escalates correctly. When a jailbreak succeeds, call traces show which layer failed—ASR, prompt logic, tool permissions, or missing policy checks.

Include realistic injection attempts, social-engineering prompts, and “confusable” ASR cases where a harmless phrase becomes harmful after transcription. Test tool safety (parameter validation, confirmation steps), and re-run the suite after prompt, model, or vendor updates because jailbreak surfaces change over time.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”