Selecting the link function g for generalized BTL/PL preference models
Identify a positive, increasing link function g for the generalized Bradley–Terry–Luce/Plackett–Luce preference model that best captures human preference distributions in RLHF settings, and evaluate how the corresponding preference-matching regularizer impacts alignment performance.
References
As for what function g better captures human preference, we leave this question for future investigation.
— On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
(2405.16455 - Xiao et al., 26 May 2024) in Section 3 (Extension to Generalized Preference Models)