Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 13.7B
- Active params
- 13.7B
- Context window
- 4K tokens
- Attention
- mha
- Position encoding
- rope
- Pretraining tokens
- 5.0T
- Training hardware
- H100
- Post-training
- sft, dpo, rlvr
- OSI-approved
- yes
- Data released
- yes
- Training code
- released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU | 68.5 | as of 2024-11-26 | source ↗ |
Math
| MATH | 39.2 | as of 2024-11-26 | source ↗ |
Held-out / arena
| IFEval | 82.6 | as of 2024-11-26 | source ↗ |
Recommended use cases
- research reproducibility
- OSAID-aligned deployment
- fine-tuning base
Available quantizations
GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8.
runs on llama.cpp, Ollama
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Full training stack open (data + code + logs)
- · OSAID-aligned at 13B
- · RLVR post-training
Known limitations
- · 5T-token pretrain is below the 15-36T used by 2025-class open-weights releases; benchmark scores reflect this. source ↗
Lineage
Fully open: weights + Dolma pretraining data + training code + WandB logs. Post-training shares the Tülu 3 recipe.