Dec 26, 2023 – Dec 26, 2023
Interesting method which is now widespread of improving LLMs to better match human preferences using RL.