Tutorials·January 17, 2026·10 min readCS336 Notes: Lecture 15 - Alignment, SFT and RLHFPost-training for helpful assistants: supervised fine-tuning on instructions, safety tuning, RLHF with preference data, PPO vs DPO, and the challenges of learning from human feedback.machine-learningalignmentstanford-cs336rlhfRead