Nous Research DisTrO

DisTrO (Distributed Training Over-the-Internet) is a family of bandwidth-efficient training optimizers from Nous Research. The central idea: instead of synchronizing full-precision gradients between workers each step (which requires hyperscaler-class networking), use lossy compression of gradient updates. Reductions on the order of 100-1000x in inter-node bandwidth with acceptable convergence. DisTrO matters because it is the strongest open candidate for training-over-commodity-internet, which is the foundation any serious decentralized-training story needs. Compared to siblings: Prime Intellect's `prime` framework (similar overall goal, used in INTELLECT-1 and INTELLECT-2; some technical overlap), Pluralis (different topology approach), Gensyn (cryptographic verification angle, less mature on the bandwidth-efficiency side). Production-readiness: research-to-pilot. The technique works at modest scale and has shipped reproducible results; whether it scales to true frontier (1T+ parameters) over the public internet is the open question. Used in Nous Research's own training experiments and increasingly cited in the broader decentralized-training literature. Honest limit: an order of magnitude behind centralized frontier training on iso-cost, iso-time. The interesting role for DisTrO is enabling models that simply could not exist otherwise (community-trained, regional, sovereignty-anchored), not racing the frontier.

Other projects at the Training layer

6 siblings · ordered open first

Unsloth Open source

2-30x faster fine-tuning with 70% less VRAM; consumer-GPU-friendly LoRA / QLoRA; free open-source tier.

Axolotl Open source

General-purpose fine-tuning framework; supports most open-weights models out of the box.

LLaMA-Factory Open source

Chinese-origin open fine-tuning toolkit; very feature-rich; broadly used in the open ecosystem.

HuggingFace TRL Open source

Standard library for supervised fine-tuning, RLHF, DPO, and KTO; the reference TRL implementation.

Megatron-LM Open source

NVIDIA's distributed-training framework; the heaviest open frontier-training tooling.

DeepSpeed Open source

Microsoft's distributed-training framework; ZeRO sharding; widely used outside Microsoft too.

Sources

Other projects at the Training layer