CS336 Notes: Lecture 8 - Parallelism 2
Hands-on distributed training: implementing collectives with PyTorch and NCCL, data/tensor/pipeline parallelism in practice, and understanding the compute-memory-communication tradeoff.
Read
Blog
Filter
Hands-on distributed training: implementing collectives with PyTorch and NCCL, data/tensor/pipeline parallelism in practice, and understanding the compute-memory-communication tradeoff.
Distributed training fundamentals: data parallelism, ZeRO/FSDP for memory efficiency, tensor and pipeline parallelism, and how to combine strategies for frontier-scale models.