bhuv's notebook

#ML

Some technical intuition on RLHF and Direct Preference Optimisation

Some technical intuition on RLHF and Direct Preference Optimisation