Performance & Token Optimization

Overview

Guidelines for optimizing Claude Code performance through model selection, context window management, extended thinking configuration, and build troubleshooting. The core principle: match model capability to task complexity for optimal cost-performance balance.

Model Selection Strategy

Model	Strength	Use Cases
Haiku 4.5	90% of Sonnet capability, 3x cheaper	Lightweight agents, pair programming, worker agents
Sonnet 4.6	Best coding model	Main development, multi-agent orchestration, complex coding
Opus 4.5	Deepest reasoning	Complex architecture, maximum reasoning, research/analysis

Context Window Management

Avoid in the last 20% of context window:

Large-scale refactoring
Multi-file feature implementation
Debugging complex interactions

Lower context sensitivity (safe in late context):

Single-file edits
Independent utility creation
Documentation updates
Simple bug fixes

Extended Thinking + Plan Mode

Extended thinking is enabled by default, reserving up to 31,999 tokens for internal reasoning.

Controls:

Toggle: Option+T (macOS) / Alt+T (Windows/Linux)
Config: alwaysThinkingEnabled in ~/.claude/settings.json
Budget cap: export MAX_THINKING_TOKENS=10000
Verbose mode: Ctrl+O to see thinking output

For complex tasks requiring deep reasoning:

Ensure extended thinking is enabled (default: on)
Enable Plan Mode for structured approach
Use multiple critique rounds for thorough analysis
Use split role sub-agents for diverse perspectives

Build Troubleshooting

If build fails:

Use the build-error-resolver agent
Analyze error messages
Fix incrementally
Verify after each fix

Performance & Token Optimization

Performance & Token Optimization

Overview

Model Selection Strategy

Context Window Management

Extended Thinking + Plan Mode

Build Troubleshooting

相关技能 Related Skills

MCP Configuration Guide

Regex vs LLM Decision Framework for Structured Text

Claude Code Longform Guide