The Open-Source AI Stack
RSS

The Stack · Weights · Open source

DeepSeek V3 / R1

Cost-quality reset; V3 papers documented architectural innovations (MoE, MLA, aux-loss-free MoE); R1 open reasoning model.

MIT · stable · Project site →

DeepSeek is a Chinese frontier-AI lab. Their open releases reset two ceilings in late 2024-early 2025. DeepSeek-V3 (Dec 2024) demonstrated frontier-class capability trained for an order of magnitude less than US frontier labs were spending, using a Mixture-of-Experts architecture with multiple architectural innovations (Multi-head Latent Attention, auxiliary-loss-free MoE balancing). DeepSeek-R1 (Jan 2025) was the first openly-released frontier-class reasoning model, with R1-Zero showing that pure reinforcement-learning post-training could produce strong reasoning without the typical SFT step. DeepSeek matters because it forced a re-evaluation of what open-weights labs can accomplish on a constrained compute budget. Compared to siblings: Llama (similar capability ceiling but restrictive license vs DeepSeek's MIT), Qwen (similar Chinese-lab open posture but different architectural choices), OLMo (truly open including data, smaller capability ceiling). DeepSeek's distinctive angle is "frontier capability on permissive license (MIT), at a cost structure that breaks the hyperscaler-budget assumption." Production-ready; widely served by hosted-inference providers and self-hosters. Subsequent V3.1 (mid-2025) introduced hybrid reasoning (one model with thinking and non-thinking modes). Caveats: training data is not disclosed (does not satisfy strict OSAID), and the trust questions about data provenance and political content filtering apply as they do to any closed- data lab.

Sources

Want a follow-up? Ask the chat about DeepSeek V3 / R1 in context. It will compare to siblings at the same layer and ground every claim in the wiki.

Other projects at the Weights layer

9 siblings · ordered open first