Selecting the link function g for generalized BTL/PL preference models

Identify a positive, increasing link function g for the generalized Bradley–Terry–Luce/Plackett–Luce preference model that best captures human preference distributions in RLHF settings, and evaluate how the corresponding preference-matching regularizer impacts alignment performance.

Background

The authors extend preference matching beyond the exponential link by introducing a generalized BTL/PL model with an arbitrary positive, increasing function g. They derive the corresponding PM differential equation and closed-form PM regularizer for any such g, showing the theory accommodates a wide class of preference models.

However, the practical choice of g that most accurately reflects real human preferences remains unresolved, and determining an appropriate g is necessary to realize the potential empirical benefits of the generalized framework.

References

As for what function g better captures human preference, we leave this question for future investigation.

— On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization (2405.16455 - Xiao et al., 26 May 2024) in Section 3 (Extension to Generalized Preference Models)

Selecting the link function g for generalized BTL/PL preference models

Sponsor

Background

References

Related Problems