Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 8.1B
- Active params
- 8.1B
- Context window
- 4K tokens
- Attention
- gqa
- Position encoding
- rope
- Pretraining tokens
- 12.0T
- Training hardware
- H100
- Post-training
- sft, rlhf
- OSI-approved
- yes
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU | 65.8 | as of 2024-10-21 | source ↗ |
| MMLU-Pro | 34.5 | as of 2024-10-21 | source ↗ |
| GPQA-Diamond | 33.8 | as of 2024-10-21 | source ↗ |
Code
| HumanEval | 64.6 | as of 2024-10-21 | source ↗ |
Held-out / arena
| IFEval | 52.3 | as of 2024-10-21 | source ↗ |
Recommended use cases
- enterprise RAG
- function-calling agents
- multilingual chat
Available quantizations
GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8.
runs on llama.cpp, Ollama
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Apache-2.0 enterprise instruct release
- · Function-calling and RAG focus
- · 12-language multilingual coverage
Known limitations
- · Pretraining data and training code are not released; only the weights are open. source ↗