Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 7B
- Active params
- 7.3B
- Context window
- not verified
- Attention
- mha
- Position encoding
- rope
- Pretraining tokens
- 4.0T
- Training hardware
- H100
- Post-training
- sft, dpo
- OSI-approved
- yes
- Data released
- yes
- Training code
- released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU | 61.3 | as of 2024-12-02 | source ↗ |
Recommended use cases
- research reproducibility
- OSAID-compliant deployment
- fine-tuning base
Available quantizations
GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8.
runs on llama.cpp, Ollama
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Full training stack open (data + code + logs)
- · OSAID-compliant
Known limitations
- · 5T-token pretrain is well below the 15-36T used by 2025-class open-weights releases; benchmark scores reflect this. source ↗
Lineage
Fully open: weights + Dolma pretraining data + training code + training logs.