Models · Compare
Mixtral 8x7B Instruct v0.1 vs Gemini 1.5 Pro
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Mixtral 8x7B Instruct v0.1 | B: Gemini 1.5 Pro |
|---|---|---|
| Released | 2023-12-11 | 2024-02-15 |
| Developer | Mistral AI | Google DeepMind |
| Openness | Open | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | unknown |
| Total params | 46.7B | — |
| Active params | 12.9B | — |
| Experts | 8 (2 active) | — |
| Context window | 33K | 2.1M |
| Attention | gqa | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | — | — |
| Post-training | sft, dpo | rlhf |
| Training hardware | — | — |
| $/M input | — | $0.00 |
| $/M output | — | $0.00 |
| Output tok/sec | — | 0 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | — | 75.0 2026-05-21 |
| GPQA-Diamond | — | 58.9 2026-05-21 |
Code
| LiveCodeBench | — | 31.6 2026-05-21 |
Math
| MATH | — | 87.6 2026-05-21 |
| AIME 2024 | — | 23.0 2026-05-21 |
Context · A
The first widely-used open-weights MoE, with 8 experts and 2 active per token. Quality matched dense 70B-class models at ~13B active parameter inference cost.
Context · B
Google's first long-context Gemini checkpoint, introduced with a 128K standard window and a 1M token preview tier. Google described the design as a mixture-of-experts that activates a subset of expert networks per input, and demonstrated 99% recall on needle-in-a-haystack across 1M tokens at launch. The context window was later extended to 2M tokens in private preview, announced May 14 2024.
Mixtral 8x7B Instruct v0.1 detail → · Gemini 1.5 Pro detail →