Models · Compare

GPT-5.1 vs Kimi K2 Thinking

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: GPT-5.1	B: Kimi K2 Thinking
Released	2025-11-12	2025-11-06
Developer	OpenAI	Moonshot AI
Openness	Proprietary	Open
License	Proprietary	Modified MIT
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	unknown	moe
Total params	—	1T
Active params	—	32B
Experts	—	384 (8 active)
Context window	400K	256K
Attention	unknown	mla
Position enc.	unknown	rope
Pretraining tokens	—	—
Post-training	rlhf	sft, rlhf
Training hardware	—	—
$/M input	$1.25	$0.60
$/M output	$10.00	$2.50
Output tok/sec	114.7	102.4

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	87.0 2026-05-21	84.6 2025-11-06
GPQA-Diamond	87.3 2026-05-21	84.5 2025-11-06

Code

SWE-Bench Verified	—	71.3 2025-11-06
LiveCodeBench	86.8 2026-05-21	85.3 2026-05-21

Math

AIME 2025

94.0 2026-05-21

99.1 2025-11-06

Context · A

Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.

Context · B

Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.

GPT-5.1 detail → · Kimi K2 Thinking detail →