Models · Compare
Grok 4 vs Kimi K2 Instruct
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Grok 4 | B: Kimi K2 Instruct |
|---|---|---|
| Released | — | 2025-07-11 |
| Developer | xAI | Moonshot AI |
| Openness | Proprietary | Open |
| License | Proprietary | Modified MIT |
| OSI-approved | no | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | unknown | moe |
| Total params | — | 1T |
| Active params | — | 32B |
| Experts | — | 384 (8 active) |
| Context window | — | 128K |
| Attention | unknown | mla |
| Position enc. | unknown | rope |
| Pretraining tokens | — | 15.5T |
| Post-training | rlhf | sft, rlhf |
| Training hardware | — | — |
| $/M input | $5.50 | — |
| $/M output | $27.50 | — |
| Output tok/sec | 0 | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 86.6 2026-05-21 | 81.1 2025-07-15 |
| GPQA-Diamond | 87.7 2026-05-21 | 75.1 2025-07-15 |
Code
| SWE-Bench Verified | — | 65.8 2025-07-15 |
| LiveCodeBench | 81.9 2026-05-21 | — |
Math
| MATH | 99.0 2026-05-21 | — |
| AIME 2024 | 94.3 2026-05-21 | — |
| AIME 2025 | 92.7 2026-05-21 | — |
Context · A
xAI's flagship after Grok 3, released July 9 2025 and formally announced the next day. Grok 4 Heavy variant reported 50.7 percent on the text-only Humanity's Last Exam subset, a first for any model per xAI. A specialized coding variant followed shortly after.
Context · B
A trillion-parameter open-weights MoE optimized for agentic tool-use, with strong SWE-Bench results making it a viable open alternative to closed coding agents at release.