CS336 Notes: Course Overview
Most LLM tutorials teach you to call an API. This course teaches you to build what's behind it.
Stanford CS336 walks through the full pipeline: collect data, tokenize it, build a transformer, train it, evaluate it, and prepare it for real use. No shortcuts. No hand-waving over the hard parts.
- Playlist: Stanford CS336 Language Modeling From Scratch I (2025)
- Course website: https://stanford-cs336.github.io/
The Constraint
You won't train GPT-4 in this course. The models are small. But the process is the same: the choices that matter at 7B parameters are the choices that matter at 70B. The scale changes. The thinking doesn't.
What You Build
- A tokenizer that converts text to integers and back
- A transformer model from scratch
- A training loop with proper resource accounting
- Evaluation to know if your model actually works
- Alignment to make a base model useful
How to Use These Notes
Watch the lecture once at normal speed. Read the notes right after. If something doesn't click, rewatch and reread until it does.
The goal is not to memorize. The goal is to understand well enough that you could rebuild it without the notes.
Keep reading
You might also like
CS336 Notes: Lecture 4 - Mixture of Experts
Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.