The Open-Source AI Stack
RSS
All models

Models · granite

Granite 3.0 8B Instruct

Open IBM · 2024-10-21 · Apache-2.0

IBM's first Apache-licensed enterprise instruct model in the Granite 3 line, trained on 12T tokens on IBM's Blue Vela H100 supercluster. Targeted at function calling, RAG, and 12-language multilingual workloads with permissive Apache-2.0 weights.

Architecture

tokens in Embedding vocab not disclosed · granite tokenizer × 40 layers Grouped-Query Attention RoPE context 4,096 tokens Dense MLP SwiGLU activation (standard) 8.1B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
8.1B
Active params
8.1B
Context window
4K tokens
Attention
gqa
Position encoding
rope
Pretraining tokens
12.0T
Training hardware
H100
Post-training
sft, rlhf
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 65.8 as of 2024-10-21 source ↗
MMLU-Pro 34.5 as of 2024-10-21 source ↗
GPQA-Diamond 33.8 as of 2024-10-21 source ↗

Code

HumanEval 64.6 as of 2024-10-21 source ↗

Held-out / arena

IFEval 52.3 as of 2024-10-21 source ↗

Recommended use cases

  • enterprise RAG
  • function-calling agents
  • multilingual chat

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Apache-2.0 enterprise instruct release
  • · Function-calling and RAG focus
  • · 12-language multilingual coverage

Known limitations

  • · Pretraining data and training code are not released; only the weights are open. source ↗

Sources