Skip to content

ML Systems Engineering — Field Guide

Index

Index

# 7. RL for LLMs

Why This Matters¶

Reinforcement Learning is how we align LLMs beyond supervised fine-tuning. DeepSeek-R1 showed that on-policy RL (GRPO) can produce reasoning capabilities that SFT alone cannot. Netflix is building RL infrastructure as a core part of their post-training stack.

Topics¶