The Open-Source AI Stack
RSS
All models

Models · nemotron

Llama-3.1-Nemotron Ultra 253B v1

Open weights NVIDIA · 2025-04-11 · NVIDIA Open Model License

Top of NVIDIA's Llama-Nemotron family, distilled from Llama 3.1 405B via Neural Architecture Search with skip attention, variable FFN, and FFN fusion. Released April 11 2025; single-node 8x H100 BF16 inference, 4x H100 FP8. Post-trained through SFT and GRPO RL stages.

Architecture

tokens in Embedding vocab not disclosed · llama3 tokenizer × N layers Attention (not disclosed) RoPE context 131,072 tokens Dense MLP SwiGLU activation (standard) 253B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
253B
Active params
253B
Context window
131K tokens
Attention
skip-attention
Position encoding
rope
Pretraining tokens
65B
Training hardware
H100
Post-training
sft, grpo
OSI-approved
no
Data released
yes
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

GPQA-Diamond 76.0 as of 2025-04-11 source ↗

Code

LiveCodeBench 66.3 as of 2025-04-11 source ↗

Math

MATH 97.0 as of 2025-04-11 source ↗
AIME 2025 72.5 as of 2025-04-11 source ↗

Held-out / arena

IFEval 88.8 as of 2025-04-11 source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · 405B distilled to 253B via NAS
  • · FFN fusion alongside skip attention
  • · Single-node 8x H100 inference

Lineage

NAS-distilled Llama 3.1 405B at 253B for enterprise single-node deployment.

Sources