Models · Compare

Kimi K2 Instruct vs Grok 4

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Kimi K2 Instruct	B: Grok 4
Released	2025-07-11	—
Developer	Moonshot AI	xAI
Openness	Open	Proprietary
License	Modified MIT	Proprietary
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	moe	unknown
Total params	1T	—
Active params	32B	—
Experts	384 (8 active)	—
Context window	128K	—
Attention	mla	unknown
Position enc.	rope	unknown
Pretraining tokens	15.5T	—
Post-training	sft, rlhf	rlhf
Training hardware	—	—
$/M input	—	$5.50
$/M output	—	$27.50
Output tok/sec	—	0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	81.1 2025-07-15	86.6 2026-05-21
GPQA-Diamond	75.1 2025-07-15	87.7 2026-05-21

Code

SWE-Bench Verified	65.8 2025-07-15	—
LiveCodeBench	—	81.9 2026-05-21

Math

MATH	—	99.0 2026-05-21
AIME 2024	—	94.3 2026-05-21
AIME 2025	—	92.7 2026-05-21

Context · A

A trillion-parameter open-weights MoE optimized for agentic tool-use, with strong SWE-Bench results making it a viable open alternative to closed coding agents at release.

Context · B

xAI's flagship after Grok 3, released July 9 2025 and formally announced the next day. Grok 4 Heavy variant reported 50.7 percent on the text-only Humanity's Last Exam subset, a first for any model per xAI. A specialized coding variant followed shortly after.

Kimi K2 Instruct detail → · Grok 4 detail →