CS336 Notes: Course Overview

Most LLM tutorials teach you to call an API. This course teaches you to build what's behind it.

Stanford CS336 walks through the full pipeline: collect data, tokenize it, build a transformer, train it, evaluate it, and prepare it for real use. No shortcuts. No hand-waving over the hard parts.

Playlist: Stanford CS336 Language Modeling From Scratch I (2025)
Course website: https://stanford-cs336.github.io/

The Constraint

You won't train GPT-4 in this course. The models are small. But the process is the same: the choices that matter at 7B parameters are the choices that matter at 70B. The scale changes. The thinking doesn't.

What You Build

A tokenizer that converts text to integers and back
A transformer model from scratch
A training loop with proper resource accounting
Evaluation to know if your model actually works
Alignment to make a base model useful

How to Use These Notes

Watch the lecture once at normal speed. Read the notes right after. If something doesn't click, rewatch and reread until it does.

The goal is not to memorize. The goal is to understand well enough that you could rebuild it without the notes.

Keep reading

Tutorials

CS336 Notes: Lecture 4 - Mixture of Experts

January 6, 2026·12 min read

Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.

Tutorials

CS336 Notes: Lecture 3 - Architectures and Hyperparameters

January 5, 2026·8 min read

What modern LLMs converge on: pre-norm, RMSNorm, SwiGLU, RoPE, and stability tricks.

Tutorials

CS336 Notes: Lecture 1 - Overview and Tokenization

January 3, 2026·11 min read

A tour of modern LLMs, the "bitter lesson" through the lens of efficiency, and why BPE tokenization matters.

Most LLM tutorials teach you to call an API. This course teaches you to build what's behind it.

Stanford CS336 walks through the full pipeline: collect data, tokenize it, build a transformer, train it, evaluate it, and prepare it for real use. No shortcuts. No hand-waving over the hard parts.

Playlist: Stanford CS336 Language Modeling From Scratch I (2025)
Course website: https://stanford-cs336.github.io/

The Constraint

What You Build

A tokenizer that converts text to integers and back
A transformer model from scratch
A training loop with proper resource accounting
Evaluation to know if your model actually works
Alignment to make a base model useful

How to Use These Notes

Watch the lecture once at normal speed. Read the notes right after. If something doesn't click, rewatch and reread until it does.

The goal is not to memorize. The goal is to understand well enough that you could rebuild it without the notes.

Keep reading

Tutorials

CS336 Notes: Lecture 4 - Mixture of Experts

January 6, 2026·12 min read

Mixture of Experts (MoE): adding capacity without proportional compute, routing, load balancing, and what makes MoE stable.

Tutorials

CS336 Notes: Lecture 3 - Architectures and Hyperparameters

January 5, 2026·8 min read

What modern LLMs converge on: pre-norm, RMSNorm, SwiGLU, RoPE, and stability tricks.

Tutorials

CS336 Notes: Lecture 1 - Overview and Tokenization

January 3, 2026·11 min read

A tour of modern LLMs, the "bitter lesson" through the lens of efficiency, and why BPE tokenization matters.

CS336 Notes: Course Overview

The Constraint

What You Build

How to Use These Notes

Cs336 Notes

You might also like

CS336 Notes: Lecture 4 - Mixture of Experts

CS336 Notes: Lecture 3 - Architectures and Hyperparameters

CS336 Notes: Lecture 1 - Overview and Tokenization

CS336 Notes: Course Overview

The Constraint

What You Build

How to Use These Notes

Cs336 Notes

You might also like

CS336 Notes: Lecture 4 - Mixture of Experts

CS336 Notes: Lecture 3 - Architectures and Hyperparameters

CS336 Notes: Lecture 1 - Overview and Tokenization