bhuv's notebook

#RL

Some technical intuition on RLHF and Direct Preference Optimisation

Some technical intuition on RLHF and Direct Preference Optimisation