The Open-Source AI Stack
RSS
All models

Models · phi

Phi-3 Mini 4K Instruct

Open Microsoft · 2024-04-23 · MIT

Microsoft's first small-model release demonstrating that 3.8B parameters with heavy data filtering and synthetic data could reach MMLU 70%, matching much larger 2023-era models. A 128K-context variant shipped alongside via LongRoPE.

Architecture

tokens in Embedding vocab 32,064 · llama tokenizer × N layers Multi-head Attention RoPE context 4,096 tokens Dense MLP SwiGLU activation (standard) 3.8B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
3.8B
Active params
3.8B
Context window
4K tokens
Attention
mha
Position encoding
rope
Pretraining tokens
3.3T
Training hardware
H100
Post-training
sft, dpo
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 70.9 as of 2024-04-23 source ↗
GPQA-Diamond 30.6 as of 2024-04-23 source ↗

Code

HumanEval 57.3 as of 2024-04-23 source ↗

Recommended use cases

  • edge / on-device inference
  • small-model fine-tuning base
  • latency-sensitive serving

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang
MLX Apple MLX 4/8-bit layout for Apple silicon. runs on Apple MLX
bitsandbytes On-the-fly NF4 / INT8 weight quantization inside Transformers. runs on Transformers

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Heavily filtered + synthetic pretraining data
  • · 3.8B reaching MMLU 70%
  • · LongRoPE 128K variant

Known limitations

  • · Pretraining data is not released; only the weights and inference code are open. source ↗

Sources