The Open-Source AI Stack
RSS
All models

Models · Compare

Llama 3.1 Tülu 3 70B vs OpenAI o1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Llama 3.1 Tülu 3 70B B: OpenAI o1
Released 2024-11-21
Developer AI2OpenAI
Openness Open weightsProprietary
License Llama 3.1 Community LicenseProprietary
OSI-approved nono
Data released yesno
Training code yesno
Architecture denseunknown
Total params 70B
Active params
Experts
Context window
Attention gqaunknown
Position enc. rope-llama3unknown
Pretraining tokens
Post-training sft, dpo, rlvrrlhf
Training hardware
$/M input $15.00
$/M output $60.00
Output tok/sec 75.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU 83.1 2024-11-21
MMLU-Pro 84.1 2026-05-21
GPQA-Diamond 77.3 2024-12-05

Code

HumanEval 92.4 2024-11-21
LiveCodeBench 67.9 2026-05-21

Math

MATH 63.0 2024-11-21 97.0 2026-05-21
AIME 2024 83.3 2024-12-05

Held-out / arena

IFEval 83.2 2024-11-21

Context · A

AI2's flagship demonstration that the open community could match closed instruct recipes. Post-trained on top of Llama 3.1 70B with SFT, DPO, and the new RLVR (Reinforcement Learning with Verifiable Rewards) stage. Recipes, data, code, and infrastructure all open even though the weights carry Llama Community License inherited from the base.

Context · B

The first publicly available frontier reasoning model. Trained to spend extra inference compute on a "private chain of thought" before answering, setting the template the open community would chase with R1.

Llama 3.1 Tülu 3 70B detail → · OpenAI o1 detail →