DeepSeek is a Chinese frontier-AI lab. Their open releases reset two ceilings in late 2024-early 2025. DeepSeek-V3 (Dec 2024) demonstrated frontier-class capability trained for an order of magnitude less than US frontier labs were spending, using a Mixture-of-Experts architecture with multiple architectural innovations (Multi-head Latent Attention, auxiliary-loss-free MoE balancing). DeepSeek-R1 (Jan 2025) was the first openly-released frontier-class reasoning model, with R1-Zero showing that pure reinforcement-learning post-training could produce strong reasoning without the typical SFT step. DeepSeek matters because it forced a re-evaluation of what open-weights labs can accomplish on a constrained compute budget. Compared to siblings: Llama (similar capability ceiling but restrictive license vs DeepSeek's MIT), Qwen (similar Chinese-lab open posture but different architectural choices), OLMo (truly open including data, smaller capability ceiling). DeepSeek's distinctive angle is "frontier capability on permissive license (MIT), at a cost structure that breaks the hyperscaler-budget assumption." Production-ready; widely served by hosted-inference providers and self-hosters. Subsequent V3.1 (mid-2025) introduced hybrid reasoning (one model with thinking and non-thinking modes). Caveats: training data is not disclosed (does not satisfy strict OSAID), and the trust questions about data provenance and political content filtering apply as they do to any closed- data lab.
The Stack · Weights · Open source
DeepSeek V3 / R1
Cost-quality reset; V3 papers documented architectural innovations (MoE, MLA, aux-loss-free MoE); R1 open reasoning model.
Sources
- DeepSeek https://www.deepseek.com/
- DeepSeek-V3 Technical Report https://arxiv.org/abs/2412.19437
- DeepSeek-R1 Reasoning Paper https://arxiv.org/abs/2501.12948
- DeepSeek Models on HuggingFace https://huggingface.co/deepseek-ai
Want a follow-up? Ask the chat about DeepSeek V3 / R1 in context. It will compare to siblings at the same layer and ground every claim in the wiki.
Other projects at the Weights layer
9 siblings · ordered open first
- Mistral / Mixtral Open source
French lab; older open releases under Apache 2.0; flagships increasingly API-only or under research-tier licenses.
- Qwen (Alibaba) Open source
Alibaba's aggressive open-weights series (Qwen 2.5 / 3); Apache 2.0 across most sizes; full-precision weights available.
- OLMo (AI2) Open source
The only major model family meeting the strictest reading of OSAID: data (Dolma), training code, and weights all published.
- Phi (Microsoft) Open source
Small open models heavy on synthetic-data training; MIT license; cost-effective inference at edge sizes.
- Kimi (Moonshot AI) Open source
Chinese open-weights series; emphasis on long-context performance.
- GLM (Zhipu AI) Open source
Tsinghua-spinoff; ChatGLM and GLM-4 families; Apache 2.0 for major releases.
- Yi (01.AI) Open source
Kai-Fu Lee's Chinese open model family (Yi-34B etc.); Apache 2.0.
- Llama (Meta) Source available
Meta's open-weights family; dominant in usage; license carries a 700M-MAU clause and acceptable-use restrictions.
- Gemma (Google) Source available
Google's open-weights siblings to Gemini; source-available, not OSI-approved.