The Open-Source AI Stack
RSS
All models

Models · Compare

OLMo 2 13B Instruct vs OpenAI o1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: OLMo 2 13B Instruct B: OpenAI o1
Released 2024-11-26
Developer AI2OpenAI
Openness OpenProprietary
License Apache-2.0Proprietary
OSI-approved yesno
Data released yesno
Training code yesno
Architecture denseunknown
Total params 13.7B
Active params
Experts
Context window 4K
Attention mhaunknown
Position enc. ropeunknown
Pretraining tokens 5.0T
Post-training sft, dpo, rlvrrlhf
Training hardware H100
$/M input $15.00
$/M output $60.00
Output tok/sec 75.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU 68.5 2024-11-26
MMLU-Pro 84.1 2026-05-21
GPQA-Diamond 77.3 2024-12-05

Code

LiveCodeBench 67.9 2026-05-21

Math

MATH 39.2 2024-11-26 97.0 2026-05-21
AIME 2024 83.3 2024-12-05

Held-out / arena

IFEval 82.6 2024-11-26

Context · A

The 13B counterpart to OLMo 2 7B, sharing the same full-open stack: Dolma pretraining data, OLMo trainer code, WandB training logs, and Apache-2.0 weights. Post-trained with the Tülu 3 recipe of SFT, DPO, and RLVR.

Context · B

The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.

OLMo 2 13B Instruct detail → · OpenAI o1 detail →