Glossary

MLX

Apple's open-source ML framework designed for Apple Silicon's unified memory architecture, the local-first inference engine for Mac and increasingly iPad and iPhone.

Runtime also: Silicon also: Sovereignty and Decentralization Primitives aka Apple MLX

Apple’s open-source array framework, MIT-licensed, designed around Apple Silicon’s unified-memory architecture. Where most ML frameworks shuttle data between CPU and GPUsiliconA massively parallel processor originally designed for graphics, repurposed since the 2010s as the dominant compute substrate for both training and inference of large neural networks. Open full entry memory, MLX treats them as a single pool, which suits Mac Studio configurations with 128 GB or 192 GB of unified memory.

The framework supports both inferenceruntimeRunning a trained model to produce outputs (tokens, images, embeddings) from inputs at serving time, as distinct from the gradient updates of training. Open full entry and training; for end users the practical use case is running open weightsweightsA model release that publishes the trained parameters under some downloadable license, distinct from "open source" which (per OSAID) also requires data and training-code openness. Open full entry models on a Mac. MLX-LM and MLX-VLM packages cover the major model families with PyTorch-style APIs. Community-reported throughput for a quantized Llama-3-70B on a Mac Studio M2 Ultra sits in the high single digits to low teens of tokens per second; smaller models in the 7B-13B range run at tens to low hundreds.

Sources

MLX documentation

Mentioned in

Back to glossary