Track your progress. Log in to sync your course progress, notes, and chat history across devices.

Learn

Three self-paced tracks. The first walks the stack layer by layer and why openness matters at each one. The second covers how LLMs actually work, the mechanics from the inference loop to fine-tuning. The third covers how to run the stack on hardware you control. Pick any; do all three, in any order.

Walk the stack 15 modules · Socratic

Bottom-up from infrastructure to protocols. Each module ends with a Probe dialog and you writing your own summary; the course produces a downloadable map of what you learned.

Modules

How LLMs work 14 modules · mechanics

The model-side foundation. Tokens, transformers, attention, the KV cache, decoding, chat templates, long context, RAG, tool use, fine-tuning. Start with the loop; the rest follows.

Begin module 01 · The inference loop → See all 14 modules →

Modules

Self-host the stack 5 core · 2 optional

VRAM math, memory bandwidth tiers, quantization formats, inference engines, hardware strategy. The 5-module core path is one sitting; production serving and benchmarking are optional deep-dives.

Begin module 01 · GPU memory math → See all 7 modules →

Modules