The Open-Source AI Stack
RSS
All models

Models · deepseek

DeepSeek-R1

Open DeepSeek · · MIT

The first openly-released reasoning model competitive with OpenAI o1. The R1-Zero variant demonstrated that pure-RL post- training without SFT could elicit chain-of-thought reasoning. MIT-licensed weights, distillations into 1.5B-70B sizes.

Cost

$1.35 / Mtok input
$4.20 / Mtok output

DeepSeek API · as of 2026-05-21

via Artificial Analysis ↗

Speed

0 tok/sec output
0 ms TTFT

DeepSeek API · as of 2026-05-21

via Artificial Analysis ↗

Why people cared

DeepSeek-R1 is the first openly-released reasoning model that was competitive with OpenAI o1, and the paper that accompanied it (published Jan 22, 2025) became one of the most-read AI papers of the year. The technical story was that pure reinforcement learning post-training, without any supervised fine-tuning on human reasoning traces, could elicit chain-of-thought reasoning that generalized to held-out problems. The R1-Zero variant showed this most cleanly: starting from the V3 base, the team applied Group Relative Policy Optimization (GRPO) with a verifiable reward function and watched the model spontaneously develop longer reasoning traces over training. The full R1 added a small SFT cold-start to clean up readability before the RL phase. MIT-licensed weights and several distillations into smaller dense bases (1.5B, 7B, 8B, 14B, 32B, 70B Llama and Qwen variants) followed in the same release. The R1 distillations into 32B and below put genuinely capable reasoning models within reach of local deployment for the first time, and the GRPO recipe became the template that Qwen 3, Llama 4, and several Western labs followed in their own reasoning post-training.

Architecture

tokens in Embedding vocab not disclosed · deepseek tokenizer × N layers Multi-head Latent Attention RoPE + YaRN context 128,000 tokens MoE Router 256 experts total · 8 active per token shown: 32 of 256 Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
moe
Total params
671B
Active params
37B
Experts
256 total · 8 active
Context window
128K tokens
Attention
mla
Position encoding
rope-yarn
Training hardware
H800
Post-training
sft, grpo, rejection-sampling
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 90.8 as of 2025-01-22 source ↗
MMLU-Pro 84.9 as of 2026-05-21 source ↗
GPQA-Diamond 71.5 as of 2025-01-22 source ↗

Code

LiveCodeBench 77.0 as of 2026-05-21 source ↗

Math

MATH 97.3 as of 2025-01-22 source ↗
AIME 2024 79.8 as of 2025-01-22 source ↗
AIME 2025 76.0 as of 2026-05-21 source ↗

Recommended use cases

  • math reasoning
  • code reasoning
  • step-by-step problem solving

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang
GPTQ Post-training 4-bit weight quantization for GPU serving. runs on vLLM, SGLang, Transformers
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Pure-RL reasoning (R1-Zero)
  • · MIT license
  • · Open reasoning traces

Known limitations

  • · Long reasoning traces dominate output cost; expect 3-10x token output vs. non-reasoning chat models. source ↗

Lineage

First MIT-licensed open reasoning model.

Derivatives

Reception

  • "DeepSeek-R1 is the first open weights reasoning model that's truly competitive with the best closed models."

    Andrej Karpathy · 2025-01-21

Sources