DeepSeek-V2.5

Cost

$0.00 / Mtok input

$0.00 / Mtok output

· as of 2026-05-21

source ↗

Speed

0 tok/sec output

0 ms TTFT

· as of 2026-05-21

source ↗

Architecture

Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture: moe
Total params: not disclosed
Active params: 21B
Experts: 160 total · 6 active
Context window: not verified
Attention: mla
Position encoding: rope-yarn
Post-training: sft, rlhf
OSI-approved: no
Data released: no
Training code: not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

Code

HumanEval	89.0	as of 2024-09-05	source ↗
LiveCodeBench	41.8	as of 2024-09-05	source ↗

Recommended use cases

cost-efficient chat with code competence
long-context retrieval
function-calling agents

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama

MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

· Chat + Coder track merger
· Function calling and FIM in a single checkpoint
· Safety score lifted to 82.6% (vs 74.4% for V2-0628)

Lineage

Combined the Chat and Coder tracks into one V2-architecture checkpoint, on the path to V3.

Derived from

DeepSeek-V2 Chat 2024-05-07

Derivatives

DeepSeek-V3 2024-12-26