The Open-Source AI Stack
RSS
All models

Models · Compare

Kimi K2 Instruct vs Grok 4

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field A: Kimi K2 Instruct B: Grok 4
Released 2025-07-11
Developer Moonshot AIxAI
Openness OpenProprietary
License Modified MITProprietary
OSI-approved nono
Data released nono
Training code nono
Architecture moeunknown
Total params 1T
Active params 32B
Experts 384 (8 active)
Context window 128K
Attention mlaunknown
Position enc. ropeunknown
Pretraining tokens 15.5T
Post-training sft, rlhfrlhf
Training hardware
$/M input $5.50
$/M output $27.50
Output tok/sec 0

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro 81.1 2025-07-15 86.6 2026-05-21
GPQA-Diamond 75.1 2025-07-15 87.7 2026-05-21

Code

SWE-Bench Verified 65.8 2025-07-15
LiveCodeBench 81.9 2026-05-21

Math

MATH 99.0 2026-05-21
AIME 2024 94.3 2026-05-21
AIME 2025 92.7 2026-05-21

Context · A

A trillion-parameter open-weights MoE optimized for agentic tool-use, with strong SWE-Bench results making it a viable open alternative to closed coding agents at release.

Context · B

xAI's flagship after Grok 3, released July 9 2025 and formally announced the next day. Grok 4 Heavy variant reported 50.7 percent on the text-only Humanity's Last Exam subset, a first for any model per xAI. A specialized coding variant followed shortly after.

Kimi K2 Instruct detail → · Grok 4 detail →