Blog

Deep-learning

Filter

All Posts self-improvement18 machine-learning18 stanford-cs33618 letters15deep-learning15motivation8 discipline7 mindset6 books5 focus4 productivity4 action4 +74 more

Tutorials·January 13, 2026·8 min read

CS336 Notes: Lecture 11 - Scaling Laws 2

Practical scaling: muP for hyperparameter transfer, WSD learning rate schedules, case studies from Cerebras-GPT, MiniCPM, and DeepSeek on compute-optimal training.

machine-learning scaling-laws stanford-cs336 deep-learning

Read

Tutorials·January 12, 2026·8 min read

CS336 Notes: Lecture 10 - Inference

LLM inference optimization: understanding the prefill vs decode split, KV cache management, speculative decoding, and why inference is fundamentally memory-bound.

machine-learning inference stanford-cs336 deep-learning

Read

Tutorials·January 11, 2026·8 min read

CS336 Notes: Lecture 9 - Scaling Laws 1

Understanding scaling laws: how loss depends on data, parameters, and compute, the Chinchilla tradeoff for compute-optimal training, and why power laws emerge in deep learning.

machine-learning scaling-laws stanford-cs336 deep-learning

Read

Tutorials·January 10, 2026·8 min read

CS336 Notes: Lecture 8 - Parallelism 2

Hands-on distributed training: implementing collectives with PyTorch and NCCL, data/tensor/pipeline parallelism in practice, and understanding the compute-memory-communication tradeoff.

machine-learning distributed-training stanford-cs336 deep-learning

Read

Tutorials·January 9, 2026·9 min read

CS336 Notes: Lecture 7 - Parallelism 1

Distributed training fundamentals: data parallelism, ZeRO/FSDP for memory efficiency, tensor and pipeline parallelism, and how to combine strategies for frontier-scale models.

machine-learning distributed-training stanford-cs336 deep-learning

Read