// anthropic engineering ยท feb 2026
Prompt Caching Is Everything
Thariq Shafi ยท Claude Code Team @ Anthropic
01 โ the problem
02 โ mechanics
start โ cache_control breakpoint03 โ architecture
Static first, dynamic last โ maximize shared prefix
04 โ lesson
Update system prompt with new info (time, file changes)
โ Invalidates entire cache prefix
Send system message in next conversation turn
โ Cache prefix survives intact
05 โ lesson
06 โ design pattern
Swap to read-only toolset
โ Cache break! ๐ฅ
EnterPlanMode / ExitPlanMode are callable tools
โ Tool definitions never change
07 โ design pattern
defer_loading: true08 โ design pattern
Different system prompt, no tools โ pays full price for entire history
Same prefix (prompt + tools + history) โ append compaction as final message
09 โ economics
Claude Sonnet โ per million tokens
$0.60/turn ร 50 turns = $30.00
$0.06/turn ร 50 turns = $3.60
10 โ independent validation
"KV-cache hit rate is the single most important metric for a production-stage AI agent" โ Peak Ji, Manus founder
// conclusion