Models · Compare
Mixtral 8x22B Instruct v0.1 vs GPT-4 Turbo
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Mixtral 8x22B Instruct v0.1 | B: GPT-4 Turbo |
|---|---|---|
| Released | 2024-04-17 | 2024-04-09 |
| Developer | Mistral AI | OpenAI |
| Openness | Open | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | unknown |
| Total params | 141B | — |
| Active params | 39B | — |
| Experts | — | — |
| Context window | 66K | 128K |
| Attention | gqa | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | — | — |
| Post-training | sft, dpo | rlhf |
| Training hardware | — | — |
| $/M input | — | $10.00 |
| $/M output | — | $30.00 |
| Output tok/sec | — | 27.8 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | — | 69.4 2026-05-21 |
Code
| LiveCodeBench | — | 29.1 2026-05-21 |
Math
| MATH | 44.6 2024-04-17 | 73.7 2026-05-21 |
| AIME 2024 | — | 15.0 2026-05-21 |
Context · A
Mistral's flagship open-weights sparse MoE released in April 2024 under Apache 2.0. Routes 39B active parameters per token through 2 of 8 experts, giving dense-class throughput at frontier-tier total capacity. Natively supports function calling and is multilingual across English, French, Italian, German, and Spanish.
Context · B
Announced at OpenAI DevDay on November 6, 2023 as a 128K-context, cheaper successor to the original GPT-4 endpoint. The gpt-4-turbo-2024-04-09 revision shipped as the general-availability version with vision support and a knowledge cutoff through December 2023.