Models · Compare

Claude Opus 4 vs DeepSeek-R1 (May 2025 refresh)

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Claude Opus 4	B: DeepSeek-R1 (May 2025 refresh)
Released	2025-05-22	2025-05-28
Developer	Anthropic	DeepSeek
Openness	Proprietary	Open
License	Proprietary	MIT
OSI-approved	no	yes
Data released	no	no
Training code	no	no
Architecture	unknown	moe
Total params	—	—
Active params	—	—
Experts	—	—
Context window	—	—
Attention	unknown	mla
Position enc.	unknown	rope-yarn
Pretraining tokens	—	—
Post-training	rlhf, constitutional	sft, grpo, rejection-sampling
Training hardware	—	H800
$/M input	$15.00	—
$/M output	$75.00	—
Output tok/sec	38.1	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	86.0 2026-05-21	85.0 2025-05-28
GPQA-Diamond	74.9 2025-05-22	81.0 2025-05-28

Code

SWE-Bench Verified	72.5 2025-05-22	—
LiveCodeBench	54.2 2026-05-21	73.3 2025-05-28

Math

MATH	94.1 2026-05-21	—
AIME 2024	56.3 2026-05-21	91.4 2025-05-28
AIME 2025	33.9 2025-05-22	—

Context · A

Flagship of the May 22 2025 Claude 4 launch. Returned to the Opus name after the Claude 3.5 and 3.7 lines skipped Opus entirely. Held the same $15 / $75 input/output price ceiling as Claude 3 Opus, and like Sonnet 4 carried hybrid reasoning with extended thinking and parallel tool execution.

Context · B

An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.

Claude Opus 4 detail → · DeepSeek-R1 (May 2025 refresh) detail →