Skip to content
ML Systems Engineering — Field Guide
Vllm
Search
Home
1. LLM Post-Training
2. Distributed Training
3. ML Inference & Serving
4. Data at Scale
5. Recommender Systems
6. Transformers Deep Dive
7. RL for LLMs
8. MLOps & Infrastructure
Target Roles
Reading List
ML Systems Engineering — Field Guide
Home
1. LLM Post-Training
1. LLM Post-Training
Overview
Supervised Fine-Tuning (SFT)
RLHF & Reward Modeling
DPO & Offline Methods
GRPO & On-Policy RL
LoRA & Parameter-Efficient Methods
Knowledge Distillation
Case Study: Netflix Post-Training Framework
2. Distributed Training
2. Distributed Training
Overview
Data Parallelism & FSDP
Tensor & Pipeline Parallelism
Ray for ML Orchestration
Checkpointing & Fault Tolerance
MFU & Performance Monitoring
3. ML Inference & Serving
3. ML Inference & Serving
Overview
vLLM & Serving Engines
KV Cache & Memory Management
Quantization
Batching Strategies
4. Data at Scale
4. Data at Scale
Overview
Apache Spark SQL
Apache Hive
Feature Stores & Pipelines
Data for Post-Training
5. Recommender Systems
5. Recommender Systems
Overview
Collaborative Filtering & Embeddings
Two-Tower & Retrieval Models
Semantic IDs & Content Understanding
Personalization at Netflix
6. Transformers Deep Dive
6. Transformers Deep Dive
Overview
Self-Attention & Multi-Head Attention
Positional Encoding
Mixture of Experts (MoE)
Modern Architectures (Qwen, Gemma, Llama)
7. RL for LLMs
7. RL for LLMs
Overview
Policy Gradient & PPO
GRPO (Group Relative Policy Optimization)
Reward Modeling
On-Policy vs Off-Policy
8. MLOps & Infrastructure
8. MLOps & Infrastructure
Overview
Experiment Tracking
CI/CD for Models
A/B Testing & Evaluation
Target Roles
Reading List
Vllm
# vLLM & Serving Engines
Coming soon — PagedAttention, continuous batching, SGLang comparison.
Back to top