Models · Compare

Kimi K2 Thinking vs GPT-5.1

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Kimi K2 Thinking	B: GPT-5.1
Released	2025-11-06	2025-11-12
Developer	Moonshot AI	OpenAI
Openness	Open	Proprietary
License	Modified MIT	Proprietary
OSI-approved	no	no
Data released	no	no
Training code	no	no
Architecture	moe	unknown
Total params	1T	—
Active params	32B	—
Experts	384 (8 active)	—
Context window	256K	400K
Attention	mla	unknown
Position enc.	rope	unknown
Pretraining tokens	—	—
Post-training	sft, rlhf	rlhf
Training hardware	—	—
$/M input	$0.60	$1.25
$/M output	$2.50	$10.00
Output tok/sec	102.4	114.7

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	84.6 2025-11-06	87.0 2026-05-21
GPQA-Diamond	84.5 2025-11-06	87.3 2026-05-21

Code

SWE-Bench Verified	71.3 2025-11-06	—
LiveCodeBench	85.3 2026-05-21	86.8 2026-05-21

Math

AIME 2025

99.1 2025-11-06

94.0 2026-05-21

Context · A

Moonshot's first reasoning model with native thinking interleaved with tool calls, released November 6 2025. Sustains coherence across 200-300 tool invocations per the lab. Ships with native INT4 quantization-aware training and a Modified MIT license.

Context · B

Released November 12 2025 with three initial models; two more (Codex-Mini, Codex-Max) followed November 19. Headline change is a warmer default tone plus eight selectable personalities. GPT-5.1 Instant gained adaptive reasoning, deciding per-turn whether to think before responding.

Kimi K2 Thinking detail → · GPT-5.1 detail →