Claude Code Memory: How a 50KB Index Burned Half My Quota

I'd kicked off a routine PR review in Claude Code last week. Nothing exotic. It fans out a handful of sub-agents to verify findings, runs for a few minutes, returns a structured report. Standard work. When it finished, my session meter said I'd burned 48% of the 5-hour quota. The work itself reported ~328k tokens used.
Those numbers didn't add up. A PR review shouldn't eat half a quota. So I went looking.
The Investigation
My first guess was a runaway sub-agent pulling in giant files. I checked the agent transcripts. Nothing weird. Each agent did its job, returned a normal-sized result, and exited.
My second guess was that I'd accumulated a long conversation and the context window had grown. Also no. The session was only fifteen turns deep.
Then I remembered the auto-memory system. Anything Claude Code decides is worth remembering across sessions gets written into a directory of small Markdown files, with a top-level MEMORY.md acting as the index. That index is re-loaded into the system prompt at the start of every turn.
I checked the size of mine.
$ wc -c -l ~/.claude/projects/.../memory/MEMORY.md
391 51924
391 lines, 51.9KB. The system was already auto-warning that anything past line 200 would be truncated. I had nearly twice that. And every one of those bytes was being shipped to the model on every single turn.
The Math
Here's the cost equation that mattered.
session cost ≈ (per-turn context size × number of turns) + sub-agent tokens
For a long-running session the per-turn context dominates. PR reviews with sub-agent fan-out are long sessions. Sub-agents are billed once each. The system prompt is billed every turn.
Rule of thumb: 1KB of text ≈ 250 input tokens. So:
- 51.9KB × 250 tokens/KB ≈ 13,000 tokens per turn just for the memory index.
- × 15 turns ≈ 195,000 tokens billed before any actual work.
- Plus the same content getting cached, re-cached, and partially invalidated as I edited unrelated files in the same session.
That accounted for most of the missing budget.
How MEMORY.md Got Fat
The auto-memory system is genuinely useful. It builds institutional memory across sessions. Project context, preferences, references to investigations I've done before, behavioural feedback ("don't propose disabling that cache"). When I come back to the codebase a week later, the model already knows the lay of the land.
But the index had drifted. Looking at what was actually in there:
- Multi-line summaries of project state that duplicated content already in linked detail files. The system loads linked files on demand. There's no need to inline a five-line summary in the index.
- A running log of merged PRs. Once a PR is merged, it's in
git log. Keeping a tracker in the always-loaded index added bytes without adding signal. - Verbose migration histories from months ago, where a one-line pointer to the detail file would have done the same job.
- Duplicated entries because subsequent sessions wrote a new memory rather than updating an old one.
None of this was anyone's fault. The model writes memories optimistically because that's safer than forgetting. Trimming is the half that doesn't happen automatically.
The Fix
I rewrote MEMORY.md as a pure index. One line per entry, ~150 characters max, formatted as a bullet with title, file link, and a one-line hook. Verbose context went into the linked files. Anything no longer actionable got archived (file kept, index entry dropped). Anything duplicating git log got deleted outright.
The result:
51.9KB / 391 lines → 18.7KB / 178 lines
About 33KB cut per turn. Across a typical fifteen-turn session that's ~125k tokens saved before any work is done. The behavioural-guidance entries all survived. Those are the ones that actually change how the model works on this codebase. Small in bytes, large in effect.
I also wrote the lesson back into memory as a feedback entry, so future-me gets the rule applied automatically. Whenever MEMORY.md exceeds ~25KB or ~200 lines, audit and trim.
Where to Start
If you've got a long-lived Claude Code project, the steps are short.
- Check the size of your
MEMORY.md. Runwc -c ~/.claude/projects/<your-project>/memory/MEMORY.md. If it's over ~25KB, you're paying for it every turn. - Treat the index like an index, not a notebook. One-line entries. Verbose content goes in the linked files, which load on demand.
- Trim, don't just append. The model writes memories more eagerly than it consolidates them. The trim step is on you.
- The per-turn context dominates long sessions. If you're optimising token cost, tighten the system prompt before you tighten agent prompts. Bigger ROI per byte cut.
It's a small bit of housekeeping with a surprisingly large effect. My next PR review came in at the expected cost, and I haven't seen the 48% quota spike since.