The Open-Source AI Stack
RSS
All models

Models · phi

Phi-3 Medium 4K Instruct

Open Microsoft · 2024-05-21 · MIT

Microsoft's 14B follow-up to Phi-3 Mini, trained on 4.8T tokens across 42 days on 512 H100s. Sat at MMLU 78 at release, on par with Llama 3 8B Instruct.

Architecture

tokens in Embedding vocab 32,064 · llama tokenizer × N layers Multi-head Attention RoPE context 4,096 tokens Dense MLP SwiGLU activation (standard) 14B active params Output projection tokens out
Schema-generated from data/models.yaml. Every label is auditable against the model's sources.

Specs

Architecture
dense
Total params
14B
Active params
14B
Context window
4K tokens
Attention
mha
Position encoding
rope
Pretraining tokens
4.8T
Training hardware
H100
Post-training
sft, dpo
OSI-approved
yes
Data released
no
Training code
not released

Benchmarks

Each score carries the date it was published; we never infer or interpolate missing scores.

General reasoning

MMLU 78.0 as of 2024-05-21 source ↗

Code

HumanEval 62.2 as of 2024-05-21 source ↗

Recommended use cases

  • mid-tier deployment
  • fine-tuning base
  • single-GPU serving

Available quantizations

GGUF llama.cpp's container; the common local format, k-quants from Q2 to Q8. runs on llama.cpp, Ollama
AWQ Activation-aware 4-bit weight quantization for GPU serving. runs on vLLM, SGLang

Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.

Notable innovations

  • · Data-curated small-frontier scaling
  • · 14B at MMLU 78 in mid-2024

Known limitations

  • · Pretraining data and training code are not released; only the weights and inference code are open. source ↗

Sources