The Open-Source AI Stack
RSS

The Stack · Training · Open source

Nous Research DisTrO

Decentralized training optimizer that reduces gradient-sync bandwidth 100-1000x; enables training over commodity internet.

Apache 2.0 · research · Project site →

DisTrO (Distributed Training Over-the-Internet) is a family of bandwidth-efficient training optimizers from Nous Research. The central idea: instead of synchronizing full-precision gradients between workers each step (which requires hyperscaler-class networking), use lossy compression of gradient updates. Reductions on the order of 100-1000x in inter-node bandwidth with acceptable convergence. DisTrO matters because it is the strongest open candidate for training-over-commodity-internet, which is the foundation any serious decentralized-training story needs. Compared to siblings: Prime Intellect's `prime` framework (similar overall goal, used in INTELLECT-1 and INTELLECT-2; some technical overlap), Pluralis (different topology approach), Gensyn (cryptographic verification angle, less mature on the bandwidth-efficiency side). Production-readiness: research-to-pilot. The technique works at modest scale and has shipped reproducible results; whether it scales to true frontier (1T+ parameters) over the public internet is the open question. Used in Nous Research's own training experiments and increasingly cited in the broader decentralized-training literature. Honest limit: an order of magnitude behind centralized frontier training on iso-cost, iso-time. The interesting role for DisTrO is enabling models that simply could not exist otherwise (community-trained, regional, sovereignty-anchored), not racing the frontier.

Sources

Want a follow-up? Ask the chat about Nous Research DisTrO in context. It will compare to siblings at the same layer and ground every claim in the wiki.

Other projects at the Training layer

6 siblings · ordered open first