Models · Compare
Claude Sonnet 4 vs DeepSeek-R1 (May 2025 refresh)
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Claude Sonnet 4 | B: DeepSeek-R1 (May 2025 refresh) |
|---|---|---|
| Released | 2025-05-22 | 2025-05-28 |
| Developer | Anthropic | DeepSeek |
| Openness | Proprietary | Open |
| License | Proprietary | MIT |
| OSI-approved | no | yes |
| Data released | no | no |
| Training code | no | no |
| Architecture | unknown | moe |
| Total params | — | — |
| Active params | — | — |
| Experts | — | — |
| Context window | — | — |
| Attention | unknown | mla |
| Position enc. | unknown | rope-yarn |
| Pretraining tokens | — | — |
| Post-training | rlhf, constitutional | sft, grpo, rejection-sampling |
| Training hardware | — | H800 |
| $/M input | $3.00 | — |
| $/M output | $15.00 | — |
| Output tok/sec | 48.4 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 83.7 2026-05-21 | 85.0 2025-05-28 |
| GPQA-Diamond | 70.0 2025-05-22 | 81.0 2025-05-28 |
Code
| SWE-Bench Verified | 72.7 2025-05-22 | — |
| LiveCodeBench | 44.9 2026-05-21 | 73.3 2025-05-28 |
Math
| MATH | 93.4 2026-05-21 | — |
| AIME 2024 | 40.7 2026-05-21 | 91.4 2025-05-28 |
| AIME 2025 | 33.1 2025-05-22 | — |
Context · A
Mid-tier model of the May 22 2025 Claude 4 launch. Inherited the hybrid-reasoning approach from Claude 3.7 Sonnet with near-instant and extended-thinking modes, plus parallel tool execution and an extended-thinking-with-tool-use beta. Held the SWE-Bench Verified lead for closed mid-tier coding through summer 2025 at the same $3 / $15 price point as 3.5 and 3.7 Sonnet.
Context · B
An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.
Claude Sonnet 4 detail → · DeepSeek-R1 (May 2025 refresh) detail →