Models · Compare

Llama-3.1-Nemotron Ultra 253B v1 vs OpenAI o3

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Llama-3.1-Nemotron Ultra 253B v1	B: OpenAI o3
Released	2025-04-11	2025-04-16
Developer	NVIDIA	OpenAI
Openness	Open weights	Proprietary
License	NVIDIA Open Model License	Proprietary
OSI-approved	no	no
Data released	yes	no
Training code	no	no
Architecture	dense	unknown
Total params	253B	—
Active params	—	—
Experts	—	—
Context window	131K	200K
Attention	skip-attention	unknown
Position enc.	rope	unknown
Pretraining tokens	65B	—
Post-training	sft, grpo	rlhf
Training hardware	H100	—
$/M input	—	$2.00
$/M output	—	$8.00
Output tok/sec	—	88

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	—	85.3 2026-05-21
GPQA-Diamond	76.0 2025-04-11	87.7 2025-04-16

Code

SWE-Bench Verified	—	71.7 2025-04-16
LiveCodeBench	66.3 2025-04-11	80.8 2026-05-21

Math

MATH	97.0 2025-04-11	99.2 2026-05-21
AIME 2024	—	90.3 2026-05-21
AIME 2025	72.5 2025-04-11	88.3 2026-05-21

Held-out / arena

IFEval

88.8 2025-04-11

—

Context · A

Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.

Context · B

Released April 16, 2025 as the full-size successor to o1 in the o-series reasoning lineage, with multimodal (text plus image) input and a 200K-token context. OpenAI reported large jumps over o1 on SWE-bench Verified and Codeforces and roughly 3x the accuracy of o1 on ARC-AGI.

Llama-3.1-Nemotron Ultra 253B v1 detail → · OpenAI o3 detail →