The Open-Source AI Stack
RSS
All models

Models · claude

Claude 3.7 Sonnet

Proprietary Anthropic · 2025-02-24 · Proprietary

The first "hybrid reasoning" model from Anthropic: standard and extended-thinking modes selectable per request. Strong SWE-Bench score made it the default coding-agent backend through 2025.

Cost

$3.00 / Mtok input
$15.00 / Mtok output

Anthropic API · as of 2026-05-21

source ↗

Speed

0 tok/sec output
0 ms TTFT

Anthropic API · as of 2026-05-21

via Artificial Analysis ↗

Why people cared

Claude 3.7 Sonnet was the first "hybrid reasoning" frontier model: standard and extended-thinking modes selectable per request, billed at the same input/output rates but with extended thinking consuming additional output tokens for the visible reasoning trace. The release in February 2025 paired the model with Claude Code, Anthropic's official agent harness, which made 3.7 Sonnet the default backend for serious coding-agent work through 2025. The SWE-Bench Verified score (70.3% at release) was the headline number: a closed reasoning model that could be asked to fix a real bug in a real repository and succeed at a rate that materially exceeded prior frontier models. The agentic story matters because it established "the model that can act in your codebase" as a distinct product category from "the model that can answer questions about your codebase", and Anthropic was the first frontier lab to commercialize that distinction. Open-weights catch-up arrived with Kimi K2 and DeepSeek's later releases, but Claude 3.7 Sonnet held the agentic-coding leadership position long enough to define what the category looked like.

Architecture

tokens in Embedding vocab not disclosed × N layers Architecture not disclosed (proprietary or undocumented) Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
unknown
Total params
not disclosed
Active params
not disclosed
Context window
not verified
Attention
unknown
Position encoding
unknown
Post-training
rlhf, constitutional
OSI-approved
no
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU-Pro 80.3 as of 2026-05-21 source ↗

Code

SWE-Bench Verified 70.3 as of 2025-02-24 source ↗
LiveCodeBench 39.4 as of 2026-05-21 source ↗

Math

MATH 85.0 as of 2026-05-21 source ↗
AIME 2024 22.3 as of 2026-05-21 source ↗
AIME 2025 21.0 as of 2026-05-21 source ↗

Recommended use cases

  • code agent backend
  • extended-thinking math/code
  • SWE-Bench-style tasks
  • tool use

Available quantizations

None. The weights are not distributed, so there are no public quantizations.

Notable innovations

  • · Hybrid extended-thinking mode
  • · Claude Code agent harness

Known limitations

  • · Extended-thinking mode bills the reasoning trace as output tokens; long-thinking requests can be 5-10x more expensive than standard mode. source ↗

Lineage

Sources