Models · Compare
OLMo 2 13B Instruct vs OpenAI o1
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: OLMo 2 13B Instruct | B: OpenAI o1 |
|---|---|---|
| Released | 2024-11-26 | — |
| Developer | AI2 | OpenAI |
| Openness | Open | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | yes | no |
| Training code | yes | no |
| Architecture | dense | unknown |
| Total params | 13.7B | — |
| Active params | — | — |
| Experts | — | — |
| Context window | 4K | — |
| Attention | mha | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | 5.0T | — |
| Post-training | sft, dpo, rlvr | rlhf |
| Training hardware | H100 | — |
| $/M input | — | $15.00 |
| $/M output | — | $60.00 |
| Output tok/sec | — | 75.8 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU | 68.5 2024-11-26 | — |
| MMLU-Pro | — | 84.1 2026-05-21 |
| GPQA-Diamond | — | 77.3 2024-12-05 |
Code
| LiveCodeBench | — | 67.9 2026-05-21 |
Math
| MATH | 39.2 2024-11-26 | 97.0 2026-05-21 |
| AIME 2024 | — | 83.3 2024-12-05 |
Held-out / arena
| IFEval | 82.6 2024-11-26 | — |
Context · A
The 13B counterpart to OLMo 2 7B, sharing the same full-open stack: Dolma pretraining data, OLMo trainer code, WandB training logs, and Apache-2.0 weights. Post-trained with the Tülu 3 recipe of SFT, DPO, and RLVR.
Context · B
The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.