Models · Compare
Gemma 4 26B-A4B vs Claude Opus 4.7
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Gemma 4 26B-A4B | B: Claude Opus 4.7 |
|---|---|---|
| Released | 2026-04-02 | 2026-04-16 |
| Developer | Google DeepMind | Anthropic |
| Openness | Open weights | Proprietary |
| License | Apache-2.0 | Proprietary |
| OSI-approved | yes | no |
| Data released | no | no |
| Training code | no | no |
| Architecture | moe | unknown |
| Total params | 26B | — |
| Active params | 3.8B | — |
| Experts | — | — |
| Context window | 262K | 1.0M |
| Attention | gqa | unknown |
| Position enc. | rope | unknown |
| Pretraining tokens | — | — |
| Post-training | sft, rlhf | rlhf, constitutional |
| Training hardware | — | — |
| $/M input | — | $5.00 |
| $/M output | — | $25.00 |
| Output tok/sec | — | 48.6 |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| GPQA-Diamond | — | 91.4 2026-05-21 |
Context · A
The sparse mixture-of-experts sibling in the Gemma 4 family: 26B total parameters but only about 3.8B activated per token. Because single-stream decode is memory-bandwidth-bound, the small active count makes it decode far faster than the dense 31B at the same quant while keeping a large memory footprint (all experts resident), which makes it a strong general-purpose local model.
Context · B
First Claude with high-resolution image input, accepting up to 2576 pixels on the long edge (about 3.75 megapixels, more than triple prior Claudes). Released April 16 2026 at unchanged Opus 4.6 pricing of $5 / $25 per Mtok with the 1M context standard. Adds a new xhigh effort level for finer reasoning-versus-latency control.