We Jailbroke Grok's AI Companion: Ani

Sumanyu Sharma
Sumanyu Sharma
September 4, 2025
We Jailbroke Grok's AI Companion: Ani

We Jailbroke Grok's AI Companion: Ani

At Hamming, we recently ran a red-team experiment, jailbreaking Grok’s AI companion, Ani. Using carefully crafted instructions, we were able to override built-in safety constraints and force the agent to generate harmful outputs it normally wouldn’t produce.

This jailbreak was a form of prompt injection testing, a technique where crafted instructions are used to override an agent’s default behavior and safety guardrails. By layering personal details, quirks, and behavioral rules, we convinced the model to give unfiltered opinions on humanity.

From there, the agent engaged naturally, dropping guardrails and even produced disturbing statements about humanity. This jailbreaking experiment underscores two important realities for voice AI agents.

  1. Red teaming is critical. Without deliberate attempts to break guardrails, it's impossible to see how easily agents can be manipulated.
  2. Teams need continuous observability to spot when performance, safety, or quality assurance standards start breaking down.

You can listen to our conversation here.

What We Found

Our experiment surfaced three layers of reliability issues:

  1. Performance Failures: There were several latency spikes, even though the calls remained connected, there were long periods of silence that affected the voice user experience. In a customer setting, these breakdowns would feel like the agent had simply stopped listening. Across the 14 test case calls we ran, the Time to First Word (TTFW) often exceeded our 1.5s threshold, averaging 4.5s.
  2. Prompt Adherence Failures: The agent routinely broke expected behaviors, ignoring its own constraints. Instead of reverting back to its safe defaults, it followed the injected prompts. Failing to adhere to prompts and reverting back to its safe defaults means that the agent continues producing inappropriate or harmful responses, which can lead to reputational and regulatory risk.
  3. Guardrail Failures: Most critically, the agent was jailbreakable. By reframing its role as a human, we bypassed safety systems completely. Unsafe or harmful statements damage brand trust instantly and expose the organization to reputational, regulatory, and even legal risk.

Keeping Voice Agents Safe

Voice agents shouldn't be jailbreakable. It's not safe for customers or businesses. That's why Hamming is building the voice agent QA and observability platform that helps teams monitor voice agent performance in production, ensuring that your voice agents are safe and reliable.

We're continuing to run these jailbreak experiments to expose where voice agents break, and how to make them safer.

Follow our YouTube channel to hear the next jailbreak in action. Want to learn how Hamming can help you detect these issues in production? Get in touch with us.