Why people cared
Claude 3.7 Sonnet was the first "hybrid reasoning" frontier model: standard and extended-thinking modes selectable per request, billed at the same input/output rates but with extended thinking consuming additional output tokens for the visible reasoning trace. The release in February 2025 paired the model with Claude Code, Anthropic's official agent harness, which made 3.7 Sonnet the default backend for serious coding-agent work through 2025. The SWE-Bench Verified score (70.3% at release) was the headline number: a closed reasoning model that could be asked to fix a real bug in a real repository and succeed at a rate that materially exceeded prior frontier models. The agentic story matters because it established "the model that can act in your codebase" as a distinct product category from "the model that can answer questions about your codebase", and Anthropic was the first frontier lab to commercialize that distinction. Open-weights catch-up arrived with Kimi K2 and DeepSeek's later releases, but Claude 3.7 Sonnet held the agentic-coding leadership position long enough to define what the category looked like.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- unknown
- Total params
- not disclosed
- Active params
- not disclosed
- Context window
- not verified
- Attention
- unknown
- Position encoding
- unknown
- Post-training
- rlhf, constitutional
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU-Pro | 80.3 | as of 2026-05-21 | source ↗ |
Recommended use cases
- code agent backend
- extended-thinking math/code
- SWE-Bench-style tasks
- tool use
Available quantizations
None. The weights are not distributed, so there are no public quantizations.
Notable innovations
- · Hybrid extended-thinking mode
- · Claude Code agent harness
Known limitations
- · Extended-thinking mode bills the reasoning trace as output tokens; long-thinking requests can be 5-10x more expensive than standard mode. source ↗
Lineage
Derived from
Claude 3.5 Sonnet 2024-06-20