RecNet

Ctrl+K

Skeleton

Generalized Preference Optimization: A Unified Approach to Offline AlignmentYunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot

Feb, 2024

Read

Anne Wu recommended on on 2/13/2024

A formulation for offline alignment; the authors consider recent work (DPO, IPO, SLiC, etc.) as special cases using different losses. They consider reward modeling as a binary classification problem, and also analyzed the connections between offline regularization & KL (in RLHF).

Open Problems and Fundamental Limitations of Reinforcement Learning from Human FeedbackStephen Casper, Xander Davies et al.

Sep, 2013

Read

Anne Wu recommended on on 1/2/2024

A survey paper about challenges in RLHF, categorized by human feedback, reward model, and policy. The authors also distinguish tractable/fundamental limitations, and discuss safety, governance and transparency. Sometimes I wonder about the problem specification in RLHF.

LLF-Bench: Benchmark for Interactive Learning from Language FeedbackChing-An Cheng, Andrey Kolobov, Dipendra Misra, Allen Nie, Adith Swaminathan

Dec, 2023

Read

Anne Wu recommended on on 12/19/2023

The bench includes several sequential decision-making tasks (reco, navigation, etc.), and the agent needs to solve them using NL instructions+language feedback (synthetic). They use paraphrasing to avoid overfitting. Some design choices could be relevant, but no baseline provided

Prediction-Oriented Bayesian Active LearningFreddie Bickford Smith, Andreas Kirsch, Sebastian Farquhar, Yarin Gal, Adam Foster, Tom Rainforth

Apr, 2023

Read

Anne Wu recommended on on 11/28/2023

Many information-theoretic approaches in active learning optimizes the BALD score. This work argues that it can be suboptimal & proposes the expected predictive information gain (EPIG) measuring info gain in prediction space (and not in parameter space).

Vision Transformers Need RegistersTimothee Darcet, Maxime Oquab, Julien Mairal & Piotr Bojanowski

2023

Read

Anne Wu recommended on on 11/21/2023

Artifacts are identified in both SL/SSL ViTs - redundant tokens with high norms appearing during inference, primarily in low-informative background areas of images, suggesting that those tokens are repurposed to store global information. The work proposes to add a register token.

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?Ahmed Alajrami, Katerina Margatina, Nikolaos Aletras

2023

Read

Anne Wu recommended on on 11/14/2023

Interesting paper probing how information loss in input token characters affects the performance of pretrained LMs. Pretraining with subsets of token chars, ex. just 1 char/token (extreme case), could still lead to 90% on SuperGLUE: some subsets may be more informative.