The Open-Source AI Stack
RSS
All models

Models · phi-4

Phi-4-mini Instruct

Open Microsoft · · MIT

Small-tier Phi 4 released February 2025: 3.8B dense decoder-only with 128K context, 200K vocab, and grouped-query attention. Trained on 5T tokens for 21 days on 512 A100-80G GPUs, with a data cutoff of June 2024. Supports 22 languages.

Architecture

tokens in Embedding vocab 200,064 · phi tokenizer × N layers Grouped-Query Attention RoPE context 128,000 tokens Dense MLP SwiGLU activation (standard) 3.8B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
not disclosed
Active params
3.8B
Context window
128K tokens
Attention
gqa
Position encoding
rope
Pretraining tokens
5.0T
Training hardware
A100
Post-training
sft, dpo
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 67.3 as of 2025-02-26 source ↗

Math

MATH 64.0 as of 2025-02-26 source ↗

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang
GPTQ Post-training 4-bit weight quantization for GPU serving. runs on vLLM, SGLang, Transformers
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
FP8 8-bit float, frequently a native release on Hopper / Blackwell GPUs. runs on vLLM, SGLang, TensorRT-LLM
bitsandbytes On-the-fly NF4 / INT8 weight quantization inside Transformers. runs on Transformers

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · 200K vocab for multilingual coverage
  • · 128K context at 3.8B
  • · MIT-licensed Phi entry

Lineage

Small dense Phi-4 sibling; 200K vocab is the key tokenizer change.

Derived from

Phi-4 2024-12-12

Sources