Skip to content

Dpo

# Direct Preference Optimization (DPO)

Coming soon — preference pairs, Bradley-Terry model, offline optimization.