Models · Compare

OpenAI o3 vs Llama-3.1-Nemotron Ultra 253B v1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: OpenAI o3	B: Llama-3.1-Nemotron Ultra 253B v1
Released	2025-04-16	2025-04-11
Developer	OpenAI	NVIDIA
Openness	Proprietary	Open weights
License	Proprietary	NVIDIA Open Model License
OSI-approved	no	no
Data released	no	yes
Training code	no	no
Architecture	unknown	dense
Total params	—	253B
Active params	—	—
Experts	—	—
Context window	200K	131K
Attention	unknown	skip-attention
Position enc.	unknown	rope
Pretraining tokens	—	65B
Post-training	rlhf	sft, grpo
Training hardware	—	H100
$/M input	$2.00	—
$/M output	$8.00	—
Output tok/sec	88	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	85.3 2026-05-21	—
GPQA-Diamond	87.7 2025-04-16	76.0 2025-04-11

Code

SWE-Bench Verified	71.7 2025-04-16	—
LiveCodeBench	80.8 2026-05-21	66.3 2025-04-11

Math

MATH	99.2 2026-05-21	97.0 2025-04-11
AIME 2024	90.3 2026-05-21	—
AIME 2025	88.3 2026-05-21	72.5 2025-04-11

Held-out / arena

IFEval

—

88.8 2025-04-11

Context · A

Released April 16, 2025 as the full-size successor to o1 in the o-series reasoning lineage, with multimodal (text plus image) input and a 200K-token context. OpenAI reported large jumps over o1 on SWE-bench Verified and Codeforces and roughly 3x the accuracy of o1 on ARC-AGI.

Context · B

Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.

OpenAI o3 detail → · Llama-3.1-Nemotron Ultra 253B v1 detail →