Models · Compare
DeepSeek-R1 (May 2025 refresh) vs Claude Sonnet 4
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: DeepSeek-R1 (May 2025 refresh) | B: Claude Sonnet 4 |
|---|---|---|
| Released | 2025-05-28 | 2025-05-22 |
| Developer | DeepSeek | Anthropic |
| Openness | Open | Proprietary |
| License | MIT | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | unknown |
| Total params | — | — |
| Active params | — | — |
| Experts | — | — |
| Context window | — | — |
| Attention | mla | unknown |
| Position enc. | rope-yarn | unknown |
| Pretraining tokens | — | — |
| Post-training | sft, grpo, rejection-sampling | rlhf, constitutional |
| Training hardware | H800 | — |
| $/M input | — | $3.00 |
| $/M output | — | $15.00 |
| Output tok/sec | — | 48.4 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 85.0 2025-05-28 | 83.7 2026-05-21 |
| GPQA-Diamond | 81.0 2025-05-28 | 70.0 2025-05-22 |
Code
| SWE-Bench Verified | — | 72.7 2025-05-22 |
| LiveCodeBench | 73.3 2025-05-28 | 44.9 2026-05-21 |
Math
| MATH | — | 93.4 2026-05-21 |
| AIME 2024 | 91.4 2025-05-28 | 40.7 2026-05-21 |
| AIME 2025 | — | 33.1 2025-05-22 |
Context · A
An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.
Context · B
Mid-tier model of the May 22 2025 Claude 4 launch. Inherited the hybrid-reasoning approach from Claude 3.7 Sonnet with near-instant and extended-thinking modes, plus parallel tool execution and an extended-thinking-with-tool-use beta. Held the SWE-Bench Verified lead for closed mid-tier coding through summer 2025 at the same $3 / $15 price point as 3.5 and 3.7 Sonnet.
DeepSeek-R1 (May 2025 refresh) detail → · Claude Sonnet 4 detail →