The Open-Source AI Stack
RSS

Glossary

Ollama

A local inference runtime that wraps llama.cpp with a Docker-style developer experience, the easiest path to running open-weight models on a personal machine.

The local-firstsovereignty-decentralizationAn architecture stance where inference (and increasingly memory and agent state) runs on the user's own device rather than a remote API, prioritizing privacy, latency, and offline operation. Open full entry runtime that put open weightsweightsA model release that publishes the trained parameters under some downloadable license, distinct from "open source" which (per OSAID) also requires data and training-code openness. Open full entry inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry within reach of non-specialists. ollama pull llama3 downloads a quantized model and makes it queryable via a simple HTTP API; ollama run llama3 opens an interactive shell. The model files are GGUF under the hood; the runtime is llama.cpp; Ollama is the management layer.

The product matters because installation friction is the dominant barrier to local-firstsovereignty-decentralizationAn architecture stance where inference (and increasingly memory and agent state) runs on the user's own device rather than a remote API, prioritizing privacy, latency, and offline operation. Open full entry adoption. By packaging models, runtimes, and configuration into one tool with one command per operation, Ollama turned what was a sysadmin task into a developer-tool task.

Full coverage at /projects/ollama.

Sources

Mentioned in

Back to glossary