Papers
Topics
Authors
Recent
Search
2000 character limit reached

What Does Preference Learning Recover from Pairwise Comparison Data?

Published 10 Feb 2026 in cs.LG | (2602.10286v1)

Abstract: Pairwise preference learning is central to machine learning, with recent applications in aligning LLMs with human preferences. A typical dataset consists of triplets $(x, y+, y-)$, where response $y+$ is preferred over response $y-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.