Models · Compare
GPT-5.1 vs Kimi K2 Thinking
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: GPT-5.1 | B: Kimi K2 Thinking |
|---|---|---|
| Released | 2025-11-12 | 2025-11-06 |
| Developer | OpenAI | Moonshot AI |
| Openness | Proprietary | Open |
| License | Proprietary | Modified MIT |
| OSI-approved | no | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | unknown | moe |
| Total params | — | 1T |
| Active params | — | 32B |
| Experts | — | 384 (8 active) |
| Context window | 400K | 256K |
| Attention | unknown | mla |
| Position enc. | unknown | rope |
| Pretraining tokens | — | — |
| Post-training | rlhf | sft, rlhf |
| Training hardware | — | — |
| $/M input | $1.25 | $0.60 |
| $/M output | $10.00 | $2.50 |
| Output tok/sec | 114.7 | 102.4 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 87.0 2026-05-21 | 84.6 2025-11-06 |
| GPQA-Diamond | 87.3 2026-05-21 | 84.5 2025-11-06 |
Code
| SWE-Bench Verified | — | 71.3 2025-11-06 |
| LiveCodeBench | 86.8 2026-05-21 | 85.3 2026-05-21 |
Math
| AIME 2025 | 94.0 2026-05-21 | 99.1 2025-11-06 |
Context · A
Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.
Context · B
Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.