The Open-Source AI Stack
RSS

The Stack · Weights · Open source

OLMo (AI2)

The only major model family meeting the strictest reading of OSAID: data (Dolma), training code, and weights all published.

Apache 2.0 · stable · Project site →

OLMo is a family of open language models from the Allen Institute for AI (AI2). What makes OLMo distinctive is not the capability ceiling, which is below the frontier, but the disclosure depth. Apache 2.0 weights, full training data (Dolma corpus), training code, evaluation suite, intermediate checkpoints, and training logs are all published. The training pipeline is reproducible end-to-end. No other major model family in 2026 meets this bar. OLMo matters because it answers the question "what does 'open source AI' actually look like at the strictest reading of OSAID v1.0?" Llama, Mistral, Qwen, DeepSeek, and Gemma all publish weights, but the training data is closed for all of them. Under the Open Source Initiative's published definition, only OLMo qualifies as fully open. This makes OLMo the existence proof that frontier-class training-data disclosure is feasible at all, and the reference example anyone arguing for stricter open-AI definitions can point to. Production-ready as a research and reference model. OLMo 2 (32B, Mar 2025) and OLMo 3 family (Nov 2025) closed much of the capability gap with similar-size open-weights peers, though not with the frontier. Strong for academic use, reproducibility research, post-training experimentation. Less competitive with Llama 3 / Qwen 3 / DeepSeek for raw application performance. Stewarded by AI2; long-term direction depends on AI2's continued funding for the program.

Sources

Want a follow-up? Ask the chat about OLMo (AI2) in context. It will compare to siblings at the same layer and ground every claim in the wiki.

Other projects at the Weights layer

9 siblings · ordered open first

Grants attributed

1 match from /grants