Why people cared
Llama 3.1 405B is the first openly-released model trained at GPT-4-class scale. The release was paired with a detailed 92-page technical report covering pretraining recipe, post-training, and infrastructure (including the failure-rate analysis on Meta's 16,384-H100 cluster, which became a reference point for anyone planning frontier training). On benchmarks it lands within reach of contemporaneous closed frontier models on MMLU and reasoning suites, which made it the first time researchers had access to a 400B-class checkpoint's weights for ablation studies and distillation experiments. Its practical deployment story is more constrained: at 810 GB in fp16, single-machine inference requires either multi-GPU sharding or fp8 quantization, and even at fp8 it pushes against the limits of single 8xH100 nodes. That cost-and-complexity ceiling is why the smaller 70B-class checkpoints (3.1 70B and the post-training-refreshed 3.3 70B) capture more production usage. The 405B's lasting impact is the published recipe and the synthetic data the larger checkpoint generated for post-training the 70B and 8B siblings, a pattern subsequent open-weights releases have copied.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- dense
- Total params
- 405B
- Active params
- 405B
- Context window
- 131K tokens
- Attention
- gqa
- Position encoding
- rope-llama3
- Pretraining tokens
- 15.6T
- Training hardware
- H100
- Post-training
- sft, dpo, rejection-sampling
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
Recommended use cases
- frontier-quality on-prem deployment
- synthetic-data generation
- teacher for distillation
Available quantizations
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · First open-weights model at 400B+ scale
- · Detailed published tech report
Known limitations
Lineage
Largest dense Llama; trained with 16K H100s on the same data mix as the 8B and 70B.