Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 3.8B
- Active params
- 3.8B
- Context window
- 4K tokens
- Attention
- mha
- Position encoding
- rope
- Pretraining tokens
- 3.3T
- Training hardware
- H100
- Post-training
- sft, dpo
- OSI-approved
- yes
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
Code
| HumanEval | 57.3 | as of 2024-04-23 | source ↗ |
Recommended use cases
- edge / on-device inference
- small-model fine-tuning base
- latency-sensitive serving
Available quantizations
GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8.
runs on llama.cpp, Ollama
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Heavily filtered + synthetic pretraining data
- · 3.8B reaching MMLU 70%
- · LongRoPE 128K variant
Known limitations
- · Pretraining data is not released; only the weights and inference code are open. source ↗