The Open-Source AI Stack
RSS
All models

Models · granite

Granite 4.1 8B Instruct

Open IBM · 2026-04-29 · Apache-2.0

Dense Granite 4.1 line released April 29 2026 at 3B, 8B, and 30B sizes under Apache 2.0. The 8B Instruct matches or beats IBM's own 32B MoE Granite 4.0 Small on tool calling, math, coding, and instruction following, a 4x parameter reduction. Released alongside Granite 4.1 speech, vision, embeddings, and Guardian safety models.

Architecture

tokens in Embedding vocab not disclosed × N layers Attention (not disclosed) Position encoding not disclosed context 524,288 tokens Dense MLP SwiGLU activation (standard) 8B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
8B
Active params
8B
Context window
524K tokens
Attention
unknown
Position encoding
unknown
Post-training
sft, rlhf
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · 8B dense matching 32B MoE on key benchmarks
  • · 512K-context support
  • · Full Granite 4.1 model family (speech, vision, embeddings, Guardian)

Lineage

Dense 8B matching IBM's own 32B MoE flagship; 512K context.

Sources