Index
# 7. RL for LLMs
Why This Matters¶
Reinforcement Learning is how we align LLMs beyond supervised fine-tuning. DeepSeek-R1 showed that on-policy RL (GRPO) can produce reasoning capabilities that SFT alone cannot. Netflix is building RL infrastructure as a core part of their post-training stack.