06 Weights

core

Model artifacts and their license tiers.

Overview

The model files. The serialized parameters that constitute “the model” in the everyday sense. Having the weights means you can run, fine-tuningtrainingContinued training of a pretrained base model on a smaller, task-specific dataset to specialize its behavior without retraining from scratch. Open full entry , and inspect the model; it does not mean you can retrain or reproduce it.

Five things to keep in mind as you read:

Weights are the most superficial layer of openness. They don’t include the data or the training code. Most “open AI” releases stop here.
Three license tiers matter. Apache 2.0 / MIT (genuinely open), Llama-style community licenses (source-available with revocable use clauses), and proprietary (Claude, GPT, Gemini).
The open-weights side is led by Chinese labs. QwenweightsAlibaba's open-weight model family, leading the multilingual and Chinese-language open-weight space, released under Apache 2.0 with sizes from 0.6B to 235B parameters. Open full entry , DeepSeekweightsA Chinese open-weight family known for the V3 MoE base model and the R1 reasoning model, both released under permissive licenses and unusually transparent in their training-cost reporting. Open full entry , Kimi, GLM. Plus Meta’s LlamaweightsMeta's open-weight model family, the most widely deployed open release through 2024 to 2026, released under the source-available Community License with an MAU cap and acceptable-use clause. Open full entry family, MistralweightsA French open-weight model family from Mistral AI, released mostly under Apache 2.0 with strong performance per parameter and notable MoE variants (Mixtral, Mixtral 8x22B). Open full entry , Google’s GemmaweightsGoogle's open-weight model family derived from Gemini research, with source-available licensing that includes an acceptable-use clause and license-revocation hook. Open full entry , AI2’s OLMo.
OLMo is the reference for “fully open”. Weights, data, training code, training logs. Almost no other production model meets this bar.
OSAIDgovernanceThe OSI's October 2024 definition of "open source AI," requiring not just weights but enough information about data, code, and architecture for third parties to reproduce the system. Open full entry v1.0 makes this debate operational. Under the strictest reading, most “open-weights” releases don’t count as open source AI.

The rest of this page walks the license tiers and the major families, then arrives at the OSAID question.

The license tiers

Three buckets that matter in practice.

Genuinely open (Apache 2.0 or MIT). You can run, fine-tune, redistribute, sell derivatives. No acceptable-use clauses, no revocation rights, no jurisdictional carve-outs. The default for Qwen 3, Mistral 7B, OLMo, Pythia, and a long tail of smaller releases.

Source-available with revocable terms. The Llama Community License is the prototype: free for most uses, but with an acceptable-use policy (Llama AUP) that prohibits certain applications, and an automatic-revocation clause if the user exceeds the size threshold (700M monthly active users when Llama 3 launched). Google’s Gemma uses a similar shape (Gemma terms), as does Mistral’s research-tier releases. These are not OSI-open by definition.

Proprietary. Closed weights, accessed only via API. GPT-5, Claude 4, Gemini 3. The frontier-lab default. You don’t have the weights at all; you have rate-limited inference through someone else’s gate.

A fourth shape is worth naming for completeness: lab-internal research releases (e.g., some early Mistral checkpoints released to specific researchers under bespoke agreements). Functionally proprietary; nominally “open” in lab-speak. Ignore this category unless a lab explicitly invokes it.

The major families (2026)

The open-weights leaders, roughly in order of how often they show up in production inference workloads.

Chinese labs. The single biggest shift since 2024 is that the open-weights frontier is led by Chinese labs.

Qwen (Alibaba): the Qwen 3 family, Apache 2.0, the most consistently strong open-weights line of 2026 (Qwen 3 release notes)
DeepSeek (High-Flyer Capital): V3 (671B MoE, December 2024) and R1 (reasoning model, January 2025) under the DeepSeek License (MIT-equivalent for V3) — the V3 paper showed frontier performance at a fraction of the rumored training cost of closed-lab equivalents (DeepSeek-V3 technical report)
Kimi (Moonshot AI): Kimi K2 (initial release) and the K2.5 / K2.6 successors through 2026, long-context-strong, under a Modified MIT license (attribution required at 100M monthly active users or $20M monthly revenue; otherwise standard MIT) (Kimi K2 license)
GLM (Zhipu / Tsinghua): the GLM-4.5 series, MIT license, research-strong but less production-tuned than Qwen (GLM-4.5)

Western open-weights.

Meta Llama (Llama 4, April 2025): the source-available family that anchored the open-weights ecosystem from 2023 onward; Community License terms (Llama 4 announcement)
Mistral: Apache 2.0 small models, separate research-tier for the larger frontier releases (Mistral models)
Google Gemma (Gemma 3, March 2025): the Gemma family, Gemma Terms (Gemma 3 release, March 12 2025)
AI2 OLMo (OLMo 2): the only frontier-adjacent family that ships weights + data + training code + training logs together (OLMo 2 release)

What’s open and what isn’t

The matrix:

Open weights AND open data AND open training code: OLMo, Pythia, the Common Pile-trained EleutherAI releases. The strict-OSAID-reading category.
Open weights, partial training-data disclosure: most of the Chinese-lab releases, Mistral’s open tier.
Source-available weights: Llama, Gemma. Functionally open for most uses; not OSI-open.
Closed weights: Claude, GPT, Gemini. API-only.

The asymmetry: shipping weights is easy (one file upload). Shipping the data and training code is hard (terabytes, licensing review, internal pipeline cleanup). AI2 is the only production-grade lab that makes the second trade consistently; their funding source (Microsoft via the Paul Allen estate) explicitly funds the cost of doing so.

The editorial tension

Two open questions decide where this layer goes.

One. The Chinese-lab lead on open weights is a 2024-2026 phenomenon. Whether it persists depends on US export controls (the H100 / H200 / Blackwell restrictions to Chinese entities), the Chinese government’s domestic AI policy, and whether the Chinese labs’ research velocity continues to outpace their Western counterparts’. If the gap widens, the open-weights ecosystem becomes increasingly bifurcated by jurisdiction; if it narrows, the open-weights leader-by-volume might shift back.

Two. The OSAID v1.0 revision cycle in 2026 will determine whether “open-weights” continues to mean “Llama-style source-available” or tightens to require data and training-code disclosure. The strict reading would invalidate most current “open-source AI” claims and put pressure on labs to either ship more or stop calling themselves open. The pragmatic reading keeps OSAID broadly applicable but lets weights-only releases keep the label.

Neither outcome is good for an audience that needs to make sovereignty bets today. The pragmatic move is to read the license per release, ignore the “open” claim in the press release, and judge each artifact on what it actually ships.

Key terms for this layer

AWQ full entry →

A post-training quantization method that protects the small fraction of weight channels that handle the largest activations, achieving 4-bit weights with little quality loss.
DeepSeek full entry →

A Chinese open-weight family known for the V3 MoE base model and the R1 reasoning model, both released under permissive licenses and unusually transparent in their training-cost reporting.
dense full entry →

A transformer where every parameter activates on every token; the conventional architecture before mixture of experts became common at frontier scale.
frontier full entry →

The current capability envelope of AI, defined by the most capable models in deployment at any given time; an evolving label rather than a fixed threshold.
Gemma full entry →

Google's open-weight model family derived from Gemini research, with source-available licensing that includes an acceptable-use clause and license-revocation hook.

Course agent

The course agent needs your API key to drive the dialogue.

Open Settings and paste an OpenRouter key. It stays in your browser; the server never sees it.

Open Settings →

From the rest of the stack

Funders (5) all →

National Science Foundation · US
$5M to $152M per major grant; AI Research Institutes ~$20M over 5 years.
European Commission Horizon Europe (GenAI4EU, OpenEuroLLM, RAISE) · EU
€307.3M Q4 2025 call. €221.8M for trustworthy AI. €107M for RAISE pilot. €40M for Open Internet Stack.
Allen Institute for AI (Ai2) · US
Young Investigator Program $100K + compute. AI2 Incubator $600K-$1.6M for startups.
Cohere Labs (formerly Cohere for AI) · Global
API credits only (no cash). Catalyst Grants free access to Aya, Command, Rerank, Embed for civic and academic users.
HuggingFace Community Grants · Global
Compute grants only; informal scale.

Reading list (15) all →

GPU Memory Math for LLMs (Ahmad Osman, 2026)
Post · @TheAhmadOsman on X · 2026
The Llama 3 Herd of Models
Paper · Meta AI · 2024
The DeepSeek-V3 Technical Report
Paper · DeepSeek · 2024
OLMo 2: A Truly Open Language Model
Paper · AI2 (Allen Institute) · 2025
Mistral 7B
Paper · Mistral AI · 2023