Reading List¶

Priority 1: Must Read 🔴¶

Paper	Year	Topic	Why
Attention Is All You Need	2017	Transformers	The foundation — everything builds on this
LoRA: Low-Rank Adaptation	2021	Fine-tuning	Standard method for efficient fine-tuning
InstructGPT (Ouyang et al.)	2022	RLHF	Defined the SFT → RM → PPO pipeline
DPO: Direct Preference Optimization	2023	Alignment	Simplified RLHF without reward model
DeepSeek-R1	2025	RL/Reasoning	GRPO, on-policy RL, reasoning emergence

Blog	Topic	Why
Scaling LLM Post-Training at Netflix	Post-training infra	Direct insight into Netflix's stack
vLLM: Easy, Fast, and Cheap LLM Serving	Inference	The dominant serving engine

Paper	Year	Topic
FlashAttention	2022	Efficient attention
QLoRA	2023	Quantized LoRA
Llama 2	2023	Open-weight LLMs, RLHF details
Llama 3	2024	Scaling, post-training at scale
Mixtral / MoE	2024	Mixture of Experts

Blog	Topic
Netflix Tech Blog (all ML posts)	Recsys, personalization, infrastructure
Spotify Engineering Blog	Recsys, Spark, ML platform
Meta AI Blog	Llama, open-source ML

Paper	Year	Topic
FSDP (Zhao et al.)	2023	Distributed training
RoPE (Su et al.)	2021	Positional encoding
Constitutional AI	2022	Anthropic's alignment approach
Verl	2024	RL training framework

Course	Platform	Topic
Stanford CS229 (Machine Learning)	YouTube	ML foundations
Stanford CS224N (NLP with Deep Learning)	YouTube	Transformers, attention
Stanford CS336 (LLMs from Scratch)	YouTube	Building LLMs
Hugging Face Course	huggingface.co	Practical fine-tuning
Fast.ai	fast.ai	Practical deep learning

Book	Author	Topic
Designing Machine Learning Systems	Chip Huyen	MLOps, production ML
Deep Learning	Goodfellow et al.	Theory foundations
Reinforcement Learning	Sutton & Barto	RL foundations