Dpo # Direct Preference Optimization (DPO) Coming soon — preference pairs, Bradley-Terry model, offline optimization.