Models · Compare

Grok 4 vs Kimi K2 Instruct

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Grok 4	B: Kimi K2 Instruct
Released	—	2025-07-11
Developer	xAI	Moonshot AI
Openness	Proprietary	Open
License	Proprietary	Modified MIT
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	unknown	moe
Total params	—	1T
Active params	—	32B
Experts	—	384 (8 active)
Context window	—	128K
Attention	unknown	mla
Position enc.	unknown	rope
Pretraining tokens	—	15.5T
Post-training	rlhf	sft, rlhf
Training hardware	—	—
$/M input	$5.50	—
$/M output	$27.50	—
Output tok/sec	0	—

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	86.6 2026-05-21	81.1 2025-07-15
GPQA-Diamond	87.7 2026-05-21	75.1 2025-07-15

Code

SWE-Bench Verified	—	65.8 2025-07-15
LiveCodeBench	81.9 2026-05-21	—

Math

MATH	99.0 2026-05-21	—
AIME 2024	94.3 2026-05-21	—
AIME 2025	92.7 2026-05-21	—

Context · A

xAI's flagship after Grok 3, released July 9 2025 and formally announced the next day. Grok 4 Heavy variant reported 50.7 percent on the text-only Humanity's Last Exam subset, a first for any model per xAI. A specialized coding variant followed shortly after.

Context · B

A trillion-parameter open-weights MoE optimized for agentic tool-use, with strong SWE-Bench results making it a viable open alternative to closed coding agents at release.

Grok 4 detail → · Kimi K2 Instruct detail →