Cost
Together AI · as of 2026-05-21
Why people cared
Llama 4 Scout is Meta's first MoE release, and the public reception was mixed enough to be a story in itself. The headline features were a 10M-token context window (the longest in any frontier model at release) and native multimodal input, both delivered alongside an architecture pivot that followed DeepSeek V3 in establishing MoE as mainstream for open-weights frontier work. Independent evaluators noted that the announced benchmark scores were not consistently reproducible at deployment, that the 10M-token context was achievable in theory but degraded at lengths well below the stated maximum, and that the LMArena ranking the launch material featured was for a chat-tuned variant not available as weights. The release itself remains historically significant as the moment Llama abandoned dense scaling, but the immediate developer narrative was that DeepSeek V3 and Qwen 3 had executed the open-weights MoE story more cleanly five months earlier. Llama 4 Scout's lasting value depends on whether the Maverick and Behemoth siblings shipped on the same architecture deliver on the long-context and multimodal promises in production deployment.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- moe
- Total params
- 109B
- Active params
- 17B
- Experts
- 16 total · 1 active
- Context window
- 10.5M tokens
- Attention
- gqa
- Position encoding
- rope-llama3
- Pretraining tokens
- 40.0T
- Training hardware
- H100
- Post-training
- sft, dpo, rejection-sampling
- OSI-approved
- no
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
Code
| LiveCodeBench | 29.9 | as of 2026-05-21 | source ↗ |
Recommended use cases
- long-context tasks
- multimodal input
- frontier-class inference at MoE economics
Available quantizations
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · 10M-token context window
- · Native multimodal input
- · First Llama MoE
Known limitations
- · 10M-token context window degrades at lengths well below the stated maximum in independent evaluations. source ↗
Lineage
First Llama MoE; same Community License lineage.
Derived from
Llama 3.3 70B Instruct 2024-12-06