bhuv's notebook

#ML

Some technical intuition on RLHF and Direct Preference Optimisation

Treatise · 6 min
Some technical intuition on RLHF and Direct Preference Optimisation