Skip to content

Blog

Optimization

Tutorials··6 min read

CS336 Notes: Lecture 6 - Kernels and Triton

Writing efficient GPU kernels with Triton: profiling, benchmarking, kernel fusion, and when to hand-optimize versus using torch.compile.

Read
Tutorials··10 min read

CS336 Notes: Lecture 2 - PyTorch and Resource Accounting

Resource accounting for LLM training: compute estimates, memory budgets, dtypes, tensors, and mixed precision.

Read