Cost
DeepSeek API · as of 2026-05-21
Why people cared
DeepSeek-R1 is the first openly-released reasoning model that was competitive with OpenAI o1, and the paper that accompanied it (published Jan 22, 2025) became one of the most-read AI papers of the year. The technical story was that pure reinforcement learning post-training, without any supervised fine-tuning on human reasoning traces, could elicit chain-of-thought reasoning that generalized to held-out problems. The R1-Zero variant showed this most cleanly: starting from the V3 base, the team applied Group Relative Policy Optimization (GRPO) with a verifiable reward function and watched the model spontaneously develop longer reasoning traces over training. The full R1 added a small SFT cold-start to clean up readability before the RL phase. MIT-licensed weights and several distillations into smaller dense bases (1.5B, 7B, 8B, 14B, 32B, 70B Llama and Qwen variants) followed in the same release. The R1 distillations into 32B and below put genuinely capable reasoning models within reach of local deployment for the first time, and the GRPO recipe became the template that Qwen 3, Llama 4, and several Western labs followed in their own reasoning post-training.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- moe
- Total params
- 671B
- Active params
- 37B
- Experts
- 256 total · 8 active
- Context window
- 128K tokens
- Attention
- mla
- Position encoding
- rope-yarn
- Training hardware
- H800
- Post-training
- sft, grpo, rejection-sampling
- OSI-approved
- yes
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU | 90.8 | as of 2025-01-22 | source ↗ |
| MMLU-Pro | 84.9 | as of 2026-05-21 | source ↗ |
| GPQA-Diamond | 71.5 | as of 2025-01-22 | source ↗ |
Code
| LiveCodeBench | 77.0 | as of 2026-05-21 | source ↗ |
Recommended use cases
- math reasoning
- code reasoning
- step-by-step problem solving
Available quantizations
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Pure-RL reasoning (R1-Zero)
- · MIT license
- · Open reasoning traces
Known limitations
- · Long reasoning traces dominate output cost; expect 3-10x token output vs. non-reasoning chat models. source ↗
Lineage
First MIT-licensed open reasoning model.
Derived from
DeepSeek-V3 2024-12-26Derivatives
Reception
-
"DeepSeek-R1 is the first open weights reasoning model that's truly competitive with the best closed models."
— Andrej Karpathy · 2025-01-21