CS336 Notes: Lecture 8 - Parallelism 2
Hands-on distributed training: implementing collectives with PyTorch and NCCL, data/tensor/pipeline parallelism in practice, and understanding the compute-memory-communication tradeoff.
Read
Blog
Filter
Hands-on distributed training: implementing collectives with PyTorch and NCCL, data/tensor/pipeline parallelism in practice, and understanding the compute-memory-communication tradeoff.
Resource accounting for LLM training: compute estimates, memory budgets, dtypes, tensors, and mixed precision.