Models · Compare
Qwen3 Next 80B-A3B Instruct vs Qwen3 Max
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Qwen3 Next 80B-A3B Instruct | B: Qwen3 Max |
|---|---|---|
| Released | — | — |
| Developer | Alibaba | Alibaba |
| Openness | Open | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | moe |
| Total params | 80B | — |
| Active params | 3B | — |
| Experts | 512 (10 active) | — |
| Context window | 262K | — |
| Attention | hybrid-gated-deltanet | |
| Position enc. | rope-yarn | |
| Pretraining tokens | 15.0T | — |
| Post-training | sft, rlhf | sft, rlhf |
| Training hardware | — | — |
| $/M input | — | $1.66 |
| $/M output | — | $7.22 |
| Output tok/sec | — | 32.4 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | 80.6 2025-09-11 | 84.1 2026-05-21 |
| GPQA-Diamond | — | 76.4 2026-05-21 |
Code
| LiveCodeBench | 56.6 2025-09-11 | — |
Math
| AIME 2025 | 69.5 2025-09-11 | — |
Held-out / arena
| IFEval | 87.6 2025-09-11 | — |
Context · A
Preview of a new ultra-sparse Qwen architecture: 80B total, 3B active per token (3.75 percent of params), 512 experts with 10 routed plus 1 shared. Hybrid layout alternates Gated DeltaNet and gated attention. Alibaba reported 10 percent of the training cost of Qwen3-32B and 10x the inference throughput beyond 32K context.
Context · B
Trillion-parameter MoE, API-only via Qwen Chat and Alibaba Cloud at release. Pretrained on roughly 36T tokens. Alibaba's first proprietary Qwen flagship at this scale, breaking the open-weights pattern. Supports more than 100 languages.