Cost
Together AI · as of 2026-05-21
Why people cared
Qwen 3 235B A22B was Alibaba's MoE pivot, and the headline feature was a hybrid thinking-vs-non-thinking inference toggle controllable per request. The schema convention `235B A22B` decodes as 235 billion total parameters and 22 billion active, which puts it in the same operational class as DeepSeek V3 (671B/37B) but at a different cost tier. Apache-2.0 across the entire Qwen 3 size ladder (from 0.6B to 235B) reset the openness baseline among Chinese labs, since DeepSeek's V3 and R1 used a custom DeepSeek License with field-of-use restrictions and Llama remained on community-license terms. The 36T-token pretrain extended Qwen 2.5's 18T, and the post-training stack included GRPO reasoning alongside conventional SFT and DPO. The lasting significance of the release was less about benchmark deltas, which are within noise of DeepSeek V3, and more about establishing that a frontier-grade open MoE could ship under a permissive license from a non-US lab. That made Qwen 3 the default starting point for open-weights agentic work through 2025, especially for organizations whose deployment counsel was uncomfortable with the DeepSeek license's field-of-use clauses.
Architecture
data/models.yaml. Every label is auditable
against the model's sources.
Specs
- Architecture
- moe
- Total params
- 235B
- Active params
- 22B
- Experts
- 128 total · 8 active
- Context window
- 131K tokens
- Attention
- gqa
- Position encoding
- rope
- Pretraining tokens
- 36.0T
- Post-training
- sft, dpo, grpo
- OSI-approved
- yes
- Data released
- no
- Training code
- not released
Benchmarks
Each score carries the date it was published; we never infer or interpolate missing scores.
General reasoning
| MMLU-Pro | 83.0 | as of 2025-04-28 | source ↗ |
Code
| LiveCodeBench | 34.3 | as of 2026-05-21 | source ↗ |
Recommended use cases
- hybrid reasoning + chat
- agentic workflows
- long-context retrieval
Available quantizations
Verified via the Hugging Face model tree ↗. Community quantizations change over time; the families shown are those with published weights at audit time.
Notable innovations
- · Hybrid thinking mode toggle
- · 36T-token pretrain
- · Apache 2.0 across all sizes
Known limitations
- · Hybrid thinking mode is controllable per request but increases inference cost by 3-10x depending on prompt; cost numbers above reflect non-thinking mode. source ↗
Lineage
First Qwen MoE with hybrid thinking-vs-fast inference modes.
Derived from
Qwen 2.5 72B Instruct 2024-09-19