Models · Compare

DeepSeek-R1 (May 2025 refresh) vs Claude Sonnet 4

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: DeepSeek-R1 (May 2025 refresh)	B: Claude Sonnet 4
Released	2025-05-28	2025-05-22
Developer	DeepSeek	Anthropic
Openness	Open	Proprietary
License	MIT	Proprietary
OSI-approved	yes	no
Data released	no	no
Training code	no	no
Architecture	moe	unknown
Total params	—	—
Active params	—	—
Experts	—	—
Context window	—	—
Attention	mla	unknown
Position enc.	rope-yarn	unknown
Pretraining tokens	—	—
Post-training	sft, grpo, rejection-sampling	rlhf, constitutional
Training hardware	H800	—
$/M input	—	$3.00
$/M output	—	$15.00
Output tok/sec	—	48.4

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	85.0 2025-05-28	83.7 2026-05-21
GPQA-Diamond	81.0 2025-05-28	70.0 2025-05-22

Code

SWE-Bench Verified	—	72.7 2025-05-22
LiveCodeBench	73.3 2025-05-28	44.9 2026-05-21

Math

MATH	—	93.4 2026-05-21
AIME 2024	91.4 2025-05-28	40.7 2026-05-21
AIME 2025	—	33.1 2025-05-22

Context · A

An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.

Context · B

Mid-tier model of the May 22 2025 Claude 4 launch. Inherited the hybrid-reasoning approach from Claude 3.7 Sonnet with near-instant and extended-thinking modes, plus parallel tool execution and an extended-thinking-with-tool-use beta. Held the SWE-Bench Verified lead for closed mid-tier coding through summer 2025 at the same $3 / $15 price point as 3.5 and 3.7 Sonnet.

DeepSeek-R1 (May 2025 refresh) detail → · Claude Sonnet 4 detail →