Models · Compare

OLMo 2 13B Instruct vs OpenAI o1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: OLMo 2 13B Instruct	B: OpenAI o1
Released	2024-11-26	—
Developer	AI2	OpenAI
Openness	Open	Proprietary
License	Apache-2.0	Proprietary
OSI-approved	yes	no
Data released	yes	no
Training code	yes	no
Architecture	dense	unknown
Total params	13.7B	—
Active params	—	—
Experts	—	—
Context window	4K	—
Attention	mha	unknown
Position enc.	rope	unknown
Pretraining tokens	5.0T	—
Post-training	sft, dpo, rlvr	rlhf
Training hardware	H100	—
$/M input	—	$15.00
$/M output	—	$60.00
Output tok/sec	—	75.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU	68.5 2024-11-26	—
MMLU-Pro	—	84.1 2026-05-21
GPQA-Diamond	—	77.3 2024-12-05

Code

LiveCodeBench

—

67.9 2026-05-21

Math

MATH	39.2 2024-11-26	97.0 2026-05-21
AIME 2024	—	83.3 2024-12-05

Held-out / arena

IFEval

82.6 2024-11-26

—

Context · A

The 13B counterpart to OLMo 2 7B, sharing the same full-open stack: Dolma pretraining data, OLMo trainer code, WandB training logs, and Apache-2.0 weights. Post-trained with the Tülu 3 recipe of SFT, DPO, and RLVR.

Context · B

The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.

OLMo 2 13B Instruct detail → · OpenAI o1 detail →