The Open-Source AI Stack
RSS
All models

Models · Compare

Grok 3 vs Phi-4-mini Instruct

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Grok 3 B: Phi-4-mini Instruct
Released 2025-02-17
Developer xAIMicrosoft
Openness ProprietaryOpen
License ProprietaryMIT
OSI-approved noyes
Data released nono
Training code nono
Architecture unknowndense
Total params
Active params
Experts
Context window 131K128K
Attention unknowngqa
Position enc. unknownrope
Pretraining tokens
Post-training rlhfsft, dpo
Training hardware H100A100
$/M input $4.00
$/M output $20.00
Output tok/sec 0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU 67.3 2025-02-26
MMLU-Pro 79.9 2026-05-21
GPQA-Diamond 69.3 2026-05-21

Code

LiveCodeBench 42.5 2026-05-21

Math

MATH 87.0 2026-05-21 64.0 2025-02-26
AIME 2024 33.0 2026-05-21
AIME 2025 58.0 2026-05-21

Context · A

xAI's third-generation flagship, trained on the Colossus supercomputer (approximately 200,000 GPUs) with roughly 10x the compute of Grok 2. Released alongside a separate Grok 3 Reasoning variant and a DeepSearch product, with xAI claiming wins over GPT-4o on AIME math and GPQA science benchmarks. API access launched in April 2025.

Context · B

Small-tier Phi 4 released February 2025: 3.8B dense decoder-only with 128K context, 200K vocab, and grouped-query attention. Trained on 5T tokens for 21 days on 512 A100-80G GPUs, with a data cutoff of June 2024. Supports 22 languages.

Grok 3 detail → · Phi-4-mini Instruct detail →