Models · Compare

Phi-3 Mini 4K Instruct vs GPT-4 Turbo

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Phi-3 Mini 4K Instruct	B: GPT-4 Turbo
Released	2024-04-23	2024-04-09
Developer	Microsoft	OpenAI
Openness	Open	Proprietary
License	MIT	Proprietary
OSI-approved	yes	no
Data released	no	no
Training code	no	no
Architecture	dense	unknown
Total params	3.8B	—
Active params	—	—
Experts	—	—
Context window	4K	128K
Attention	mha	unknown
Position enc.	rope	unknown
Pretraining tokens	3.3T	—
Post-training	sft, dpo	rlhf
Training hardware	H100	—
$/M input	—	$10.00
$/M output	—	$30.00
Output tok/sec	—	27.8

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU	70.9 2024-04-23	—
MMLU-Pro	—	69.4 2026-05-21
GPQA-Diamond	30.6 2024-04-23	—

Code

HumanEval	57.3 2024-04-23	—
LiveCodeBench	—	29.1 2026-05-21

Math

MATH	—	73.7 2026-05-21
AIME 2024	—	15.0 2026-05-21

Context · A

Microsoft's first small-model release demonstrating that 3.8B parameters with heavy data filtering and synthetic data could reach MMLU 70%, matching much larger 2023-era models. A 128K-context variant shipped alongside via LongRoPE.

Context · B

Announced at OpenAI DevDay on November 6, 2023 as a 128K-context, cheaper successor to the original GPT-4 endpoint. The gpt-4-turbo-2024-04-09 revision shipped as the general-availability version with vision support and a knowledge cutoff through December 2023.

Phi-3 Mini 4K Instruct detail → · GPT-4 Turbo detail →