CS336 Notes: Lecture 15 - Alignment, SFT and RLHF
Post-training for helpful assistants: supervised fine-tuning on instructions, safety tuning, RLHF with preference data, PPO vs DPO, and the challenges of learning from human feedback.
Blog
Filter
Post-training for helpful assistants: supervised fine-tuning on instructions, safety tuning, RLHF with preference data, PPO vs DPO, and the challenges of learning from human feedback.
Data filtering and deduplication at scale: n-gram language models, fastText classifiers, importance sampling, MinHash, LSH, and Bloom filters for efficient web-scale processing.
Training data for LLMs: Common Crawl processing, quality filtering, the evolution of data pipelines from BERT to modern models, and the critical role of copyright and licensing.
LLM evaluation beyond accuracy: perplexity, knowledge benchmarks, instruction-following, agent tasks, safety, and why evaluation design shapes what models become.
Practical scaling: muP for hyperparameter transfer, WSD learning rate schedules, case studies from Cerebras-GPT, MiniCPM, and DeepSeek on compute-optimal training.