Agent Token Optimizer
Multiply your usage on Claude Code with the same spend
Nerfguard auto-routes to the best model and reasoning depth for the job, and optimizes your token usage so you don’t waste tokens and time on excess intelligence.
curl -fsSL https://nerfguard.com/install.sh | bashWorks across every major coding agent
The current self-serve version of Nerfguard can save you time and usage by applying token-efficiency techniques, allowing for static base model routing that is pinned per thread, and selecting the right reasoning level for each request on your existing Codex/Claude Code subscription or API. For the maximum savings Nerfguard provides, you can dynamically route across models from alternative inference providers. We're now rolling this capability out over time. Join the waitlist for early access.
Questions, answered.
What does Nerfguard install?
A local gateway for coding-agent traffic, plus the shell configuration needed to route supported agent requests through it.
Which coding agents does it work with?
Codex (CLI and Desktop App) and Claude Code (CLI) are enabled automatically through the Nerfguard CLI. You can also manually set up Nerfguard with any coding agent that can point to a compatible model gateway. That's most modern agents. Want to use Nerfguard with another tool? Get setup instructions.
Do I need to change how I prompt?
No. Keep using your agent normally. Nerfguard sits behind the client and chooses the right model, reasoning depth and other optimizations for each request.
How do I turn Nerfguard on/off?
Enable or disable it anytime with nerfguard enable and nerfguard disable. Nerfguard is completely reversible, though we doubt you’ll want to turn it off once you try it.
What happens when a task needs the strongest model?
Nerfguard routes up instead of forcing everything through a smaller model, so high-judgment work can still use the right model.
I like my agent / model. Do I need to switch providers, pay a different vendor or evaluate new models?
No. Nerfguard can optimize usage on the plans and inference providers you already pay for. If you’d like to maximize the savings that Nerfguard can provide, you can also leverage alternative inference providers. If you’re interested in setting up a Nerfguard deployment this way, reach out.
How much does Nerfguard cost?
It's free.
How fast is Nerfguard? Will it slow me down?
No. Nerfguard is fast. We’ve tuned the Nerfguard classifier to stay around 250ms. The end to end pipeline is negligible compared to the response times of most coding agent queries. In practice, our team has gained significant speed overall by requiring fewer tokens at higher intelligence levels.
