The Open-Source AI Stack
RSS

The Stack · Runtime · Open source

Ollama

Local model runner; Docker-style UX over llama.cpp; the easiest way to run open weights on your machine.

Ollama is a local model runner that wraps llama.cpp with a Docker-style command-line and HTTP API. Install Ollama, run `ollama pull llama3.3` and `ollama run llama3.3`, and you have a local model serving over a localhost API endpoint. MIT- licensed. Ships native installers for macOS, Linux, and Windows. Ollama matters because it makes local inference accessible to developers who do not want to learn the llama.cpp build flags and quantization settings. The model library handles GGUF downloads from a curated registry; the API speaks an OpenAI-compatible shape (with extensions) so existing client code works against Ollama with a baseURL change. Compared to siblings: llama.cpp is the engine underneath (more flexibility, more setup), LM Studio is a GUI alternative, MLX is Apple's direct API for Apple Silicon. Ollama is the most-deployed local runner among developers who want minimal friction. Production-ready for development and personal use. Used as the backend for many local-AI applications, IDE plugins, and personal-AI projects (HRF-funded Orchard pairs Ollama with Lightning and Cashu). The strategic position: Ollama is the "easy button" for local AI; for production-scale serving you generally move to vLLM or SGLang. The growing question is whether Ollama's commercialization plans (paid tiers, hosted services) preserve the open-source character that made it useful in the first place.

Sources

Want a follow-up? Ask the chat about Ollama in context. It will compare to siblings at the same layer and ground every claim in the wiki.

Other projects at the Runtime layer

6 siblings · ordered open first