Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- moe
- Total params
- 236B
- Active params
- 21B
- Experts
- 160 total · 6 active
- Context window
- 128K tokens
- Attention
- mla
- Position encoding
- rope-yarn
- Pretraining tokens
- 8.1T
- Post-training
- sft, rlhf
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
Recommended use cases
- cost-efficient chat at MoE economics
- long-context retrieval
Available quantizations
GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8.
runs on llama.cpp, Ollama
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Multi-head Latent Attention (MLA)
- · KV-cache compression
Lineage
MLA debut; architecture refined into V3.
Derivatives