Models · Compare

Qwen3 Next 80B-A3B Instruct vs Qwen3 Max

Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.

Specs

Field	A: Qwen3 Next 80B-A3B Instruct	B: Qwen3 Max
Released	—	—
Developer	Alibaba	Alibaba
Openness	Open	Proprietary
License	Apache-2.0	Proprietary
OSI-approved	yes	no
Data released	no	no
Training code	no	no
Architecture	moe	moe
Total params	80B	—
Active params	3B	—
Experts	512 (10 active)	—
Context window	262K	—
Attention	hybrid-gated-deltanet
Position enc.	rope-yarn
Pretraining tokens	15.0T	—
Post-training	sft, rlhf	sft, rlhf
Training hardware	—	—
$/M input	—	$1.66
$/M output	—	$7.22
Output tok/sec	—	32.4

Benchmarks

Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.

General reasoning

MMLU-Pro	80.6 2025-09-11	84.1 2026-05-21
GPQA-Diamond	—	76.4 2026-05-21

Code

LiveCodeBench

56.6 2025-09-11

—

Math

AIME 2025

69.5 2025-09-11

—

Held-out / arena

IFEval

87.6 2025-09-11

—

Context · A

Preview of a new ultra-sparse Qwen architecture: 80B total, 3B active per token (3.75 percent of params), 512 experts with 10 routed plus 1 shared. Hybrid layout alternates Gated DeltaNet and gated attention. Alibaba reported 10 percent of the training cost of Qwen3-32B and 10x the inference throughput beyond 32K context.

Context · B

Trillion-parameter MoE, API-only via Qwen Chat and Alibaba Cloud at release. Pretrained on roughly 36T tokens. Alibaba's first proprietary Qwen flagship at this scale, breaking the open-weights pattern. Supports more than 100 languages.

Qwen3 Next 80B-A3B Instruct detail → · Qwen3 Max detail →