The Open-Source AI Stack
RSS
All models

Models · Compare

Phi-3 Medium 4K Instruct vs GPT-4o

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Phi-3 Medium 4K Instruct B: GPT-4o
Released 2024-05-212024-05-13
Developer MicrosoftOpenAI
Openness OpenProprietary
License MITProprietary
OSI-approved yesno
Data released nono
Training code nono
Architecture denseunknown
Total params 14B
Active params
Experts
Context window 4K
Attention mhaunknown
Position enc. ropeunknown
Pretraining tokens 4.8T
Post-training sft, dporlhf
Training hardware H100
$/M input $2.50
$/M output $10.00
Output tok/sec 131.6

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU 78.0 2024-05-21
MMLU-Pro 74.8 2026-05-21
GPQA-Diamond 54.3 2026-05-21

Code

HumanEval 62.2 2024-05-21
LiveCodeBench 30.9 2026-05-21

Math

MATH 75.9 2026-05-21
AIME 2024 15.0 2026-05-21
AIME 2025 6.0 2026-05-21

Context · A

Microsoft's 14B follow-up to Phi-3 Mini, trained on 4.8T tokens across 42 days on 512 H100s. Sat at MMLU 78 at release, on par with Llama 3 8B Instruct.

Context · B

Native-multimodal model with audio + vision + text in a single pretrained backbone. Pushed real-time voice latency to under 400ms; the multimodal benchmark anchor through 2024.

Phi-3 Medium 4K Instruct detail → · GPT-4o detail →