---
title: "Claude Code Memory: How a 50KB Index Burned Half My Quota"
slug: memory-md-half-my-quota
date: 2026-05-15
updated: 2026-05-15
url: https://iammattl.com/blog/memory-md-half-my-quota
excerpt: "A routine Claude Code PR review burned 48% of my 5-hour quota. Sub-agents and conversation length were red herrings. The real cause was a 51.9KB MEMORY.md being re-injected into the system prompt every turn. The cost equation, what got fat, and the index pattern that fixed it."
tags: ["claude-code", "ai", "memory", "token-cost", "developer-tools"]
cover_image: "https://iammattl.com/images/blog/memory-md-half-my-quota/1778877365161.webp"
---
# Claude Code Memory: How a 50KB Index Burned Half My Quota

I'd kicked off a routine PR review in Claude Code last week. Nothing exotic. It fans out a handful of sub-agents to verify findings, runs for a few minutes, returns a structured report. Standard work. When it finished, my session meter said I'd burned **48% of the 5-hour quota**. The work itself reported ~328k tokens used.

Those numbers didn't add up. A PR review shouldn't eat half a quota. So I went looking.

## The Investigation

My first guess was a runaway sub-agent pulling in giant files. I checked the agent transcripts. Nothing weird. Each agent did its job, returned a normal-sized result, and exited.

My second guess was that I'd accumulated a long conversation and the context window had grown. Also no. The session was only fifteen turns deep.

Then I remembered the auto-memory system. Anything Claude Code decides is worth remembering across sessions gets written into a directory of small Markdown files, with a top-level `MEMORY.md` acting as the index. **That index is re-loaded into the system prompt at the start of every turn.**

I checked the size of mine.

```
$ wc -c -l ~/.claude/projects/.../memory/MEMORY.md
391 51924
```

**391 lines, 51.9KB.** The system was already auto-warning that anything past line 200 would be truncated. I had nearly twice that. And every one of those bytes was being shipped to the model on every single turn.

## The Math

Here's the cost equation that mattered.

```
session cost ≈ (per-turn context size × number of turns) + sub-agent tokens
```

For a long-running session the per-turn context dominates. PR reviews with sub-agent fan-out are long sessions. Sub-agents are billed once each. The system prompt is billed every turn.

Rule of thumb: 1KB of text ≈ 250 input tokens. So:

- 51.9KB × 250 tokens/KB ≈ **13,000 tokens per turn** just for the memory index.
- × 15 turns ≈ **195,000 tokens** billed before any actual work.
- Plus the same content getting cached, re-cached, and partially invalidated as I edited unrelated files in the same session.

That accounted for most of the missing budget.

## How `MEMORY.md` Got Fat

The auto-memory system is genuinely useful. It builds institutional memory across sessions. Project context, preferences, references to investigations I've done before, behavioural feedback ("don't propose disabling that cache"). When I come back to the codebase a week later, the model already knows the lay of the land.

But the index had drifted. Looking at what was actually in there:

- **Multi-line summaries of project state** that duplicated content already in linked detail files. The system loads linked files on demand. There's no need to inline a five-line summary in the index.
- **A running log of merged PRs.** Once a PR is merged, it's in `git log`. Keeping a tracker in the always-loaded index added bytes without adding signal.
- **Verbose migration histories** from months ago, where a one-line pointer to the detail file would have done the same job.
- **Duplicated entries** because subsequent sessions wrote a new memory rather than updating an old one.

None of this was anyone's fault. The model writes memories optimistically because that's safer than forgetting. Trimming is the half that doesn't happen automatically.

## The Fix

I rewrote `MEMORY.md` as a pure index. One line per entry, ~150 characters max, formatted as a bullet with title, file link, and a one-line hook. Verbose context went into the linked files. Anything no longer actionable got archived (file kept, index entry dropped). Anything duplicating `git log` got deleted outright.

The result:

```
51.9KB / 391 lines  →  18.7KB / 178 lines
```

About 33KB cut per turn. Across a typical fifteen-turn session that's ~125k tokens saved before any work is done. The behavioural-guidance entries all survived. Those are the ones that actually change how the model works on this codebase. Small in bytes, large in effect.

I also wrote the lesson back into memory as a feedback entry, so future-me gets the rule applied automatically. **Whenever `MEMORY.md` exceeds ~25KB or ~200 lines, audit and trim.**

## Where to Start

If you've got a long-lived Claude Code project, the steps are short.

1. **Check the size of your `MEMORY.md`.** Run `wc -c ~/.claude/projects/<your-project>/memory/MEMORY.md`. If it's over ~25KB, you're paying for it every turn.
2. **Treat the index like an index, not a notebook.** One-line entries. Verbose content goes in the linked files, which load on demand.
3. **Trim, don't just append.** The model writes memories more eagerly than it consolidates them. The trim step is on you.
4. **The per-turn context dominates long sessions.** If you're optimising token cost, tighten the system prompt before you tighten agent prompts. Bigger ROI per byte cut.

It's a small bit of housekeeping with a surprisingly large effect. My next PR review came in at the expected cost, and I haven't seen the 48% quota spike since.