bhuv's notebook

#RL

Some technical intuition on RLHF and Direct Preference Optimisation

Treatise · 6 min
Some technical intuition on RLHF and Direct Preference Optimisation