Models · Compare
Claude Opus 4 vs DeepSeek-R1 (May 2025 refresh)
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Claude Opus 4 | B: DeepSeek-R1 (May 2025 refresh) |
|---|---|---|
| Released | 2025-05-22 | 2025-05-28 |
| Developer | Anthropic | DeepSeek |
| Openness | Proprietary | Open |
| License | Proprietary | MIT |
| OSI-approved | no | yes |
| Data released | no | no |
| Training code | no | no |
| Architecture | unknown | moe |
| Total params | — | — |
| Active params | — | — |
| Experts | — | — |
| Context window | — | — |
| Attention | unknown | mla |
| Position enc. | unknown | rope-yarn |
| Pretraining tokens | — | — |
| Post-training | rlhf, constitutional | sft, grpo, rejection-sampling |
| Training hardware | — | H800 |
| $/M input | $15.00 | — |
| $/M output | $75.00 | — |
| Output tok/sec | 38.1 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 86.0 2026-05-21 | 85.0 2025-05-28 |
| GPQA-Diamond | 74.9 2025-05-22 | 81.0 2025-05-28 |
Code
| SWE-Bench Verified | 72.5 2025-05-22 | — |
| LiveCodeBench | 54.2 2026-05-21 | 73.3 2025-05-28 |
Math
| MATH | 94.1 2026-05-21 | — |
| AIME 2024 | 56.3 2026-05-21 | 91.4 2025-05-28 |
| AIME 2025 | 33.9 2025-05-22 | — |
Context · A
Flagship of the May 22 2025 Claude 4 launch. Returned to the Opus name after the Claude 3.5 and 3.7 lines skipped Opus entirely. Held the same $15 / $75 input/output price ceiling as Claude 3 Opus, and like Sonnet 4 carried hybrid reasoning with extended thinking and parallel tool execution.
Context · B
An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.
Claude Opus 4 detail → · DeepSeek-R1 (May 2025 refresh) detail →