Models · Compare

Phi-4 Reasoning vs Mistral Medium 3

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Phi-4 Reasoning	B: Mistral Medium 3
Released	2025-04-30	2025-05-07
Developer	Microsoft	Mistral AI
Openness	Open	Proprietary
License	MIT	Proprietary
OSI-approved	yes	no
Data released	no	no
Training code	no	no
Architecture	dense	unknown
Total params	14B	—
Active params	—	—
Experts	—	—
Context window	32K	128K
Attention	unknown	unknown
Position enc.	unknown	unknown
Pretraining tokens	16B	—
Post-training	sft, rl	sft, rlhf
Training hardware	H100	—
$/M input	—	$0.40
$/M output	—	$2.00
Output tok/sec	—	29

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	74.3 2025-04-30	76.0 2026-05-21
GPQA-Diamond	65.8 2025-04-30	57.8 2026-05-21

Code

LiveCodeBench

53.8 2025-04-30

40.0 2026-05-21

Math

MATH	—	90.7 2026-05-21
AIME 2024	75.3 2025-04-30	44.0 2026-05-21
AIME 2025	62.9 2025-04-30	30.3 2026-05-21

Context · A

14B reasoning-tuned Phi-4 derivative, SFT-only on curated reasoning traces and synthetic prompts. Trained in 2.5 days on 32 H100-80G GPUs over 16B tokens, with the Plus variant adding an RL stage. Microsoft positioned it as DeepSeek R1 territory at much smaller scale.

Context · B

Mid-tier flagship released May 7 2025 at $0.40 / $2.00 per Mtok with a 128K context window. Mistral positioned it as roughly 90 percent of Claude Sonnet 3.7 performance at a fraction of the cost, with deployment supported on self-hosted setups starting at four GPUs.

Phi-4 Reasoning detail → · Mistral Medium 3 detail →