Models · Compare
Apple On-Device Foundation Model (2025) vs DeepSeek-R1 (May 2025 refresh)
Rows highlighted in warm gray are where the models differ. Numbers carry their as-of date and primary source.
Specs
| Field | A: Apple On-Device Foundation Model (2025) | B: DeepSeek-R1 (May 2025 refresh) |
|---|---|---|
| Released | 2025-06-09 | 2025-05-28 |
| Developer | Apple | DeepSeek |
| Openness | Proprietary | Open |
| License | Proprietary | MIT |
| OSI-approved | no | yes |
| Data released | no | no |
| Training code | no | no |
| Architecture | dense | moe |
| Total params | 3B | — |
| Active params | — | — |
| Experts | — | — |
| Context window | — | — |
| Attention | unknown | mla |
| Position enc. | unknown | rope-yarn |
| Pretraining tokens | — | — |
| Post-training | sft, rlhf | sft, grpo, rejection-sampling |
| Training hardware | — | H800 |
| $/M input | — | — |
| $/M output | — | — |
| Output tok/sec | — | — |
Benchmarks
Missing scores render as not reported; never inferred. Bold highlights the leader per benchmark.
General reasoning
| MMLU-Pro | — | 85.0 2025-05-28 |
| GPQA-Diamond | — | 81.0 2025-05-28 |
Code
| LiveCodeBench | — | 73.3 2025-05-28 |
Math
| AIME 2024 | — | 91.4 2025-05-28 |
Context · A
Apple Intelligence's on-device foundation model, announced WWDC 2025 on June 9 and shipped in iOS 26. About 3B parameters with KV-cache sharing across blocks (37.5 percent KV cache reduction) and 2-bit quantization-aware training, paired with a server-side Parallel-Track Mixture-of-Experts model on Private Cloud Compute. Foundation Models framework opened direct model access to developers.
Context · B
An RL-only refresh of R1 that gained substantial ground on reasoning benchmarks (notably AIME 2024) without any new pretraining. Tightened the open-vs-closed reasoning gap.
Apple On-Device Foundation Model (2025) detail → · DeepSeek-R1 (May 2025 refresh) detail →