Skip to content

Reading List

Priority 1: Must Read 🔴

Papers

Paper Year Topic Why
Attention Is All You Need 2017 Transformers The foundation — everything builds on this
LoRA: Low-Rank Adaptation 2021 Fine-tuning Standard method for efficient fine-tuning
InstructGPT (Ouyang et al.) 2022 RLHF Defined the SFT → RM → PPO pipeline
DPO: Direct Preference Optimization 2023 Alignment Simplified RLHF without reward model
DeepSeek-R1 2025 RL/Reasoning GRPO, on-policy RL, reasoning emergence

Tech Blogs

Blog Topic Why
Scaling LLM Post-Training at Netflix Post-training infra Direct insight into Netflix's stack
vLLM: Easy, Fast, and Cheap LLM Serving Inference The dominant serving engine

Priority 2: Important 🟡

Papers

Paper Year Topic
FlashAttention 2022 Efficient attention
QLoRA 2023 Quantized LoRA
Llama 2 2023 Open-weight LLMs, RLHF details
Llama 3 2024 Scaling, post-training at scale
Mixtral / MoE 2024 Mixture of Experts

Tech Blogs

Blog Topic
Netflix Tech Blog (all ML posts) Recsys, personalization, infrastructure
Spotify Engineering Blog Recsys, Spark, ML platform
Meta AI Blog Llama, open-source ML

Priority 3: Deep Dives 🟢

Papers

Paper Year Topic
FSDP (Zhao et al.) 2023 Distributed training
RoPE (Su et al.) 2021 Positional encoding
Constitutional AI 2022 Anthropic's alignment approach
Verl 2024 RL training framework

Courses

Course Platform Topic
Stanford CS229 (Machine Learning) YouTube ML foundations
Stanford CS224N (NLP with Deep Learning) YouTube Transformers, attention
Stanford CS336 (LLMs from Scratch) YouTube Building LLMs
Hugging Face Course huggingface.co Practical fine-tuning
Fast.ai fast.ai Practical deep learning

Books

Book Author Topic
Designing Machine Learning Systems Chip Huyen MLOps, production ML
Deep Learning Goodfellow et al. Theory foundations
Reinforcement Learning Sutton & Barto RL foundations