Glossary

Megatron

NVIDIA's distributed-training framework for large transformer models, providing the reference implementation of tensor parallelism, pipeline parallelism, and 3D parallelism used in many open and closed training runs.

Training also: Compute aka Megatron-LM

NVIDIA’s reference distributed-training framework. Megatron introduced the canonical tensor-parallel and pipeline-parallel implementations for transformerruntimeThe neural network architecture that combines self-attention with feed-forward layers, dominant for language modeling since 2017 and the substrate for nearly every modern LLM. Open full entry training, with optimized fused kernels and communication patterns tuned for NVLinkcomputeNVIDIA's proprietary GPU-to-GPU interconnect, providing bandwidth an order of magnitude above PCIe and the basis for tightly-coupled 8-GPU server nodes (DGX, HGX). Open full entry and InfiniBandcomputeA high-throughput, low-latency network fabric (Mellanox, now NVIDIA) used for inter-node communication in AI training clusters, supporting RDMA for direct GPU-to-GPU transfer across machines. Open full entry . Many production frontierweightsThe current capability envelope of AI, defined by the most capable models in deployment at any given time; an evolving label rather than a fixed threshold. Open full entry training runs use Megatron-derived code, sometimes combined with DeepSpeedtrainingMicrosoft's open-source training optimization library, originator of the ZeRO sharding technique and a peer to Megatron for distributed transformer training at scale. Open full entry or FSDP for the optimizer-state shardingtrainingA distributed training pattern where parameters, gradients, and optimizer states are split across GPUs (and sometimes hosts) so the total memory footprint scales with the cluster, not with each GPU. Open full entry .

The repository is permissively licensed and serves as the de facto reference: most open distributed-training papers either build on Megatron or compare against it. The NeMo Guardrailssafety-guardrailsNVIDIA's open framework for programmable safety, topic, and conversation guardrails around LLM applications, using a Colang DSL to define allowed and disallowed conversation flows. Open full entry framework (also NVIDIA) wraps Megatron in a higher-level API for model customization.

Sources

Mentioned in

Back to glossary