DeepSeek-V2 Chat · Models · The Open-Source AI Stack

Cost

— / Mtok input

— / Mtok output

DeepSeek API · as of 2026-05-19

via Artificial Analysis ↗

Speed

— tok/sec output

DeepSeek API · as of 2026-05-19

via Artificial Analysis ↗

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: moe
Total params: 236B
Active params: 21B
Experts: 160 total · 6 active
Context window: 128K tokens
Attention: mla
Position encoding: rope-yarn
Pretraining tokens: 8.1T
Post-training: sft, rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

Recommended use cases

cost-efficient chat at MoE economics
long-context retrieval

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama

MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

· Multi-head Latent Attention (MLA)
· KV-cache compression

Lineage

MLA debut; architecture refined into V3.

Derivatives

DeepSeek-V3 2024-12-26

Sources

DeepSeek-V2 paper (May 7 2024) ↗