🚀 Launch Bookface: Hamming AI (S24) - Self-improving prompt optimizer (free for 7 days)

Sumanyu Sharma
Sumanyu Sharma
Founder & CEO
, Voice AI QA Pioneer

Has stress-tested 1M+ voice agent calls to find where they break.

May 8, 2024•3 min read
🚀 Launch Bookface: Hamming AI (S24) - Self-improving prompt optimizer (free for 7 days)

đź‘‹ Sumanyu from @Hamming; we're part of the upcoming S24 batch!

TLDR: Are you spending a lot of time hand-optimizing prompts? We're launching our Prompt Optimizer (new feature in beta) to automate prompt engineering. It's completely free for 7 days!

Quick filter: If you are spending more time tweaking prompts than shipping features, this is for you.

🌟 Click here to try our Prompt Optimizer 🌟

Convert your task into an optimized prompt in minutes

prompt-optimizer-task

Thought experiment: What if we used LLMs to optimize prompts for other LLMs?

yo-dawg-prompt-optimizer

Problem: Writing prompts by hand is tedious

Writing high-quality and performant prompts by hand requires enormous trial and error. I’ve done this loop too many times. Here's the usual workflow:

  1. Write an initial prompt.
  2. Measure how well it performs on a few examples in a prompt playground. Bonus points if you use an evals platform like Hamming to automate this flow.
  3. Tweak the prompt by hand to handle cases where it's failing.
  4. Repeat steps 2 & 3 until you get tired of word-smithing.
Manual stepWhat goes wrongOptimizer benefit
WritePrompt starts too genericGenerates stronger variants
MeasureFew examples, noisy feedbackScores at scale with LLM judge
TweakEndless trial and errorIterative improvement from outliers
RepeatSlow cyclesFaster convergence on better prompts

What's worse, new model versions often break previously working prompts. Or say you want to switch from OpenAI GPT3.5 Turbo to Llama 3. You need to re-optimize your prompts by hand. ❌

Our take: use LLMs to write optimized prompts

Describe your task, add some examples, and click run.

Prompt Optimizer Demo

Behind the scenes, we use LLMs to generate different prompt variants. Our LLM judge measures how well a particular prompt solves the task. We capture outlier examples and use them to improve the few-shot examples in the prompt. We run several "trials" to refine the prompts iteratively.

Benefits:

  • No more tedious word-smithing.
  • No more scoring outputs manually by hand.
  • No need remembering to tip your LLM or asking it to think carefully step-by-step. We all do it, but it shouldn't be required.

Meet the team

Sumanyu previously helped Citizen (safety app; backed by Founders Fund, Sequoia, 8VC) grow its users by 4X and grew an AI-powered sales program to $100s of millions in revenue/year at Tesla.

Our ask

In this launch, we showed how we help teams optimize each prompt. In our next launch, we'll walk through how teams use Hamming to optimize their entire AI app.

  • YC Deal. Our optimizer is completely free for the next 7 days!
  • Feedback. We want you to throw real world tasks at our optimizer and tell us what's working and where we can be better.
  • Warm intros. We'd love intros to anyone you know who writes a lot of prompts by hand. (including you!)

If you have any questions or need help, please contact our support team.

Email us here.

Book time on our calendly.

Frequently Asked Questions

Hamming’s Prompt Optimizer proposes and evaluates prompt variants to improve quality on your task-specific dataset. The goal is to cut down the manual trial-and-error and give teams measurable wins instead of vibe-based tweaks.

Production has a long tail: different phrasing, incomplete context, and edge cases that were not in the handful of examples you tested. Prompts also regress when models change, retrieval content shifts, or tool schemas evolve. That is why optimization needs to be tied to real datasets and regression coverage.

You provide a task definition and examples, Hamming generates candidate prompt variants, and evaluators score outputs against your criteria. The optimizer iterates based on failures and outliers, so improvements are grounded in measured outcomes rather than “this sounds better.”

Optimize for consistency and safety: fewer hallucinations, better instruction-following, correct tool-call behavior, and stable performance across slices (languages, intents, and edge cases). The best prompt changes are the ones that improve metrics without introducing regressions elsewhere.

Sumanyu Sharma

Sumanyu Sharma

Founder & CEO

Previously Head of Data at Citizen, where he helped quadruple the user base. As Senior Staff Data Scientist at Tesla, grew AI-powered sales program to 100s of millions in revenue per year.

Researched AI-powered medical image search at the University of Waterloo, where he graduated with Engineering honors on dean's list.

“At Hamming, we're taking all of our learnings from Tesla and Citizen to build the future of trustworthy, safe and reliable voice AI agents.”