Anne Wu recommended on on 1/2/2024A survey paper about challenges in RLHF, categorized by human feedback, reward model, and policy. The authors also distinguish tractable/fundamental limitations, and discuss safety, governance and transparency. Sometimes I wonder about the problem specification in RLHF.