Four Skills That Can Cut Your Claude Code Bill

Tokens are the meter. Every word Claude reads and writes costs money, and the default settings leave that meter running hard. The model explains what it is about to do, dumps whole files into the chat, explores by trial and error, and carries a bloated conversation from the first message to the last. None of that is required to get good work done.

I have used these four skills that attack the bill from four different angles: the words Claude writes, the code it writes, the work it wastes, and the context it drags around. Here is what each one does and why it saves money.

1. Caveman: stop paying for filler

Output tokens are the expensive side of the meter, and most of them are filler. “Sure, I’d be happy to help you with that. The issue you’re experiencing is likely caused by…” is fifteen words before a single fact.

Caveman strips it. Articles, pleasantries, hedging, and conjunctions all go. Technical terms, code, and error strings stay exact. The claim is roughly a 75 percent cut in output tokens with no loss of substance.

Before:

Sure! The reason your React component keeps re-rendering is that you are passing an inline object as a prop, which creates a new reference on every render.

After:

Inline obj prop -> new ref -> re-render. useMemo.

Same information. A fraction of the tokens. It reads terse, but for working sessions where you want the answer and not the bedside manner, that is the point.

2. Ponytail: the cheapest code is the code never written

Ponytail is a lazy senior dev. Lazy means efficient, not careless. Before writing anything it climbs a ladder and stops at the first rung that holds: Does this need to exist at all? Is it already in the codebase? Does the standard library do it? A native platform feature? An already-installed dependency? Can it be one line?

The token savings are a side effect of the smaller solution. A hand-rolled cache class is a hundred lines Claude has to write, that you have to read, and that every future session has to load back into context. The ponytail answer is often one line:

@lru_cache(maxsize=1000) on the fetch function. Skipped custom cache class, add when lru_cache measurably falls short.

Less code out now, less code re-read forever. It compounds. There is also a companion command, ponytail-gain, that reports the measured impact as a scoreboard if you want real numbers rather than my word for it.

3. Antigravity: stop paying to guess

This is the one people miss. A lot of tokens get burned not on the answer but on the flailing around it. Claude writes a throwaway test script to figure out what you meant. It dumps 120 lines of a file into the chat to “show you” a change. It narrates a preamble before every tool call.

Antigravity bans all of it. The rules are blunt: no exploratory test scripts, if a request is ambiguous stop and ask instead of guessing, never output full file contents, use exact chunk-based edits, and route substantial output to a local file instead of the chat window. It also sorts work into modes so a typo fix does not trigger a full planning ritual, and a large refactor gets a plan on disk instead of a wall of chat text.

The savings here are the waste you never see on a per-message basis but that adds up across a session: the guesses, the re-dumps, the preambles.

4. Handoff: stop paying compounding interest on context

Every message in a conversation gets re-read by the model on every following turn. A long session is not linear, it is closer to quadratic. Turn fifty is paying to re-read turns one through forty-nine. That old debugging tangent from two hours ago is still on the meter.

Handoff breaks the cycle. It compacts the current conversation into a single handoff document, a fresh agent picks up from that summary, and the expensive back-history drops away. You keep the state that matters and stop paying rent on the state that does not. The natural move is to hand off at each real milestone rather than letting one session grow without bound.

This one is super handy, when you want to come back to tasks days later.

How they stack

These are four different levers, so they multiply rather than overlap:

Caveman shrinks what Claude writes.
Ponytail shrinks what Claude builds.
Antigravity removes the wasted motion between the ask and the answer.
Handoff stops old context from compounding.

You do not have to run all four. Caveman alone is a fast win on output, and handoff alone rescues a session that has gotten heavy. But run together they hit the token meter from the word, the code, the waste, and the history all at once, and that is where the bill actually drops.

Get the skills

Caveman: github.com/juliusbrussee/caveman
Ponytail: github.com/DietrichGebert/ponytail
Antigravity Protocol: github.com/KINGSTAR-OMEGA/claude-token-optimizer
Handoff: github.com/mattpocock/skills