// anthropic engineering ยท feb 2026

Lessons from Building
Claude Code

Prompt Caching Is Everything

Thariq Shafi ยท Claude Code Team @ Anthropic

01 / 12

01 โ€” the problem

"Cache Rules Everything Around Me"

We run alerts on cache hit rate and declare SEVs if they're too low
02 / 12

02 โ€” mechanics

Prefix Matching

# The golden rule
cached_prefix = hash(request[0 : breakpoint])
if cached_prefix == previous_hash:
    reuse_computation()  # 10x cheaper
else:
    full_reprocess()     # full price ๐Ÿ’ธ
03 / 12

03 โ€” architecture

Order Is Everything

Static first, dynamic last โ€” maximize shared prefix

1. Static System Prompt & Tools globally cached
2. CLAUDE.MD per project
3. Session Context per session
4. Conversation Messages variable โ€” never cached
"Surprisingly fragile" โ€” a timestamp in the system prompt broke caching across the entire product
04 / 12

04 โ€” lesson

System Messages,
Not System Prompt

โœ— Don't

Update system prompt with new info (time, file changes)

โ†’ Invalidates entire cache prefix

โœ“ Do

Send system message in next conversation turn

โ†’ Cache prefix survives intact

# Instead of editing system prompt:
messages.append({
  "role": "system",
  "content": "It is now Wednesday"
})
05 / 12

05 โ€” lesson

Don't Switch Models
Mid-Session

Solution: Use subagents with focused handoff messages โ€” Claude Code's Explore agents use Haiku this way
# Opus prepares handoff for Haiku subagent
opus.handoff("Summarize this function",
  model="haiku",
  context="focused_subset")
06 / 12

06 โ€” design pattern

Plan Mode โ€” A Tool,
Not a Config Switch

โœ— Obvious approach

Swap to read-only toolset

โ†’ Cache break! ๐Ÿ’ฅ

โœ“ Claude Code's approach

EnterPlanMode / ExitPlanMode are callable tools

โ†’ Tool definitions never change

07 / 12

07 โ€” design pattern

Tool Search โ€”
Defer, Don't Remove

# Stub in prefix (always present, never changes)
{"name": "mcp_database_query",
 "defer_loading": true}

# Full schema loaded on-demand via ToolSearch
ToolSearch("database_query") โ†’ full schema
08 / 12

08 โ€” design pattern

Compaction โ€”
Cache-Safe Forking

โœ— Naive implementation

Different system prompt, no tools โ†’ pays full price for entire history

โœ“ Cache-safe

Same prefix (prompt + tools + history) โ†’ append compaction as final message

09 / 12

09 โ€” economics

The Cost Math

Claude Sonnet โ€” per million tokens

Uncached input $3.00
Cache write (1.25ร—) $3.75
Cached read (0.1ร—) $0.30

Without cache

$0.60/turn ร— 50 turns = $30.00

With cache

$0.06/turn ร— 50 turns = $3.60

10ร— cost reduction โ€” this is why cache hit rate determines whether the product can exist at its price
10 / 12

10 โ€” independent validation

Manus Confirms
Independently

"KV-cache hit rate is the single most important metric for a production-stage AI agent" โ€” Peak Ji, Manus founder

Leave errors in context โ€” they carry signal that prevents repeating mistakes
11 / 12

// conclusion

Key Takeaways

# Design your agent around these rules

1. Design your entire system around prefix matching
2. Static first, dynamic last โ€” order is everything
3. System messages > system prompt changes
4. Never change tools or models mid-session
5. Monitor cache hit rate like you monitor uptime
6. Fork operations must share the parent's prefix
"Claude Code is built around prompt caching from day one โ€” you should do the same."
12 / 12