The Open-Source AI Stack
RSS
All models

Models · Compare

Claude Sonnet 4 vs DeepSeek-R1 (May 2025 refresh)

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Claude Sonnet 4 B: DeepSeek-R1 (May 2025 refresh)
Released 2025-05-222025-05-28
Developer AnthropicDeepSeek
Openness ProprietaryOpen
License ProprietaryMIT
OSI-approved noyes
Data released nono
Training code nono
Architecture unknownmoe
Total params
Active params
Experts
Context window
Attention unknownmla
Position enc. unknownrope-yarn
Pretraining tokens
Post-training rlhf, constitutionalsft, grpo, rejection-sampling
Training hardware H800
$/M input $3.00
$/M output $15.00
Output tok/sec 48.4

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 83.7 2026-05-21 85.0 2025-05-28
GPQA-Diamond 70.0 2025-05-22 81.0 2025-05-28

Code

SWE-Bench Verified 72.7 2025-05-22
LiveCodeBench 44.9 2026-05-21 73.3 2025-05-28

Math

MATH 93.4 2026-05-21
AIME 2024 40.7 2026-05-21 91.4 2025-05-28
AIME 2025 33.1 2025-05-22

Context · A

Mid-tier model of the May 22 2025 Claude 4 launch. Inherited the hybrid-reasoning approach from Claude 3.7 Sonnet with near-instant and extended-thinking modes, plus parallel tool execution and an extended-thinking-with-tool-use beta. Held the SWE-Bench Verified lead for closed mid-tier coding through summer 2025 at the same $3 / $15 price point as 3.5 and 3.7 Sonnet.

Context · B

An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.

Claude Sonnet 4 detail → · DeepSeek-R1 (May 2025 refresh) detail →