Skip to content

Index

# 7. RL for LLMs

Why This Matters

Reinforcement Learning is how we align LLMs beyond supervised fine-tuning. DeepSeek-R1 showed that on-policy RL (GRPO) can produce reasoning capabilities that SFT alone cannot. Netflix is building RL infrastructure as a core part of their post-training stack.

Topics