Models · Compare
Kimi K2 Thinking vs GPT-5.1
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Kimi K2 Thinking | B: GPT-5.1 |
|---|---|---|
| Released | 2025-11-06 | 2025-11-12 |
| Developer | Moonshot AI | OpenAI |
| Openness | Open | Proprietary |
| License | Modified MIT | Proprietary |
| OSI-approved | no | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | unknown |
| Total params | 1T | — |
| Active params | 32B | — |
| Experts | 384 (8 active) | — |
| Context window | 256K | 400K |
| Attention | mla | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | — | — |
| Post-training | sft, rlhf | rlhf |
| Training hardware | — | — |
| $/M input | $0.60 | $1.25 |
| $/M output | $2.50 | $10.00 |
| Output tok/sec | 102.4 | 114.7 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 84.6 2025-11-06 | 87.0 2026-05-21 |
| GPQA-Diamond | 84.5 2025-11-06 | 87.3 2026-05-21 |
Code
| SWE-Bench Verified | 71.3 2025-11-06 | — |
| LiveCodeBench | 85.3 2026-05-21 | 86.8 2026-05-21 |
Math
| AIME 2025 | 99.1 2025-11-06 | 94.0 2026-05-21 |
Context · A
Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.
Context · B
Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.