Preference Leakage: A Contamination Problem in LLM-as-a-judge (2502.01534v2)

Published 3 Feb 2025 in cs.LG, cs.AI, and cs.CL

Abstract: LLMs as judges and LLM-based data synthesis have emerged as two fundamental LLM-driven data annotation methods in model development. While their combination significantly enhances the efficiency of model training and evaluation, little attention has been given to the potential contamination brought by this new model development paradigm. In this work, we expose preference leakage, a contamination problem in LLM-as-a-judge caused by the relatedness between the synthetic data generators and LLM-based evaluators. To study this issue, we first define three common relatednesses between the data generator LLM and the judge LLM: being the same model, having an inheritance relationship, and belonging to the same model family. Through extensive experiments, we empirically confirm the bias of judges towards their related student models caused by preference leakage across multiple LLM baselines and benchmarks. Further analysis suggests that preference leakage is a pervasive and real-world problem that is harder to detect compared to previously identified biases in LLM-as-a-judge scenarios. All of these findings imply that preference leakage is a widespread and challenging problem in the area of LLM-as-a-judge. We release all codes and data at: https://github.com/David-Li0406/Preference-Leakage.

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper identifies preference leakage as a bias arising when LLMs used for data generation also serve as evaluators, leading to preferential treatment of related models.
It demonstrates through experiments on benchmarks like Arena-Hard that model relatedness, particularly with supervised fine-tuning and larger model sizes, intensifies bias.
The study advocates for contamination-resistant evaluation methods and diversified data sources to address the ethical and practical challenges of bias in AI assessments.

The paper "Preference Leakage: A Contamination Problem in LLM-as-a-judge" addresses a subtle yet critical bias issue, referred to as preference leakage, which surfaces in scenarios where LLMs are employed both as data generators and evaluators. The paper identifies the potential for LLM-based synthetic data generation and evaluation to result in systematic bias due to the relatedness between data-generator LLMs and judge LLMs. This bias is coined as preference leakage and remains challenging to detect.

Key Contributions and Findings

Problem Definition: Preference leakage arises when LLMs used for data generation and evaluation are related, leading to a bias where the evaluative LLM favors the outputs generated by its related student models. The paper specifies three kinds of relatedness:
- Identical models: When the generating and judging LLMs are the same.
- Inheritance relationship: Where either the data generator or evaluator is derived from the other.
- Belonging to the same model family: Such as within models of the GPT or Gemini families.
Research Questions: The paper articulates three core research questions:
- RQ1: Does preference leakage introduce systemic biases in LLM evaluations?
- RQ2: What is the severity of preference leakage across different scenarios?
- RQ3: What mechanisms underlie preference leakage?
Experimental Analysis: Through experiments involving widely used LLM-as-a-judge benchmarks, such as Arena-Hard and AlpacaEval 2.0, the authors reveal significant evidence of bias in favor of related models, with the degree of bias correlating with the closeness of the models' relationship and the proportion of synthetic data.
Findings:
- Bias Prevalence: Judges exhibit a clear preference for their related student models, indicating widespread preference leakage across various LLMs.
- Impact of Relatedness: The degree of relatedness significantly influences bias severity, with models from the same series exhibiting more profound bias effects.
- Influence of Tuning and Model Size: Supervised fine-tuning (SFT) exacerbates preference leakage compared to other methods. Additionally, larger student models appear to amplify the bias attributed to preference leakage due to their enhanced memory and learning capabilities.
Recognition and Challenges: Investigating whether the bias stems from judges recognizing their own model's outputs, the paper finds that LLM judges do not effectively identify outputs from their related student models. This suggests that preference leakage is a more insidious problem compared to previously documented egocentric biases.
Categorical Bias Analysis: The paper shows that questions with subjective answers, such as those related to writing and programming, along with subjective judgment dimensions, are more susceptible to bias, further complicating the detection and mitigation of preference leakage.

Implications and Future Directions

The paper calls attention to preference leakage as an underappreciated issue in LLM-based automatic evaluations, stressing the need for more robust evaluation methodologies that counteract this bias. It proposes exploring diversified data sources, contamination-resistant benchmarks, and evaluation strategies to mitigate bias risks effectively. The research underscores the ethical ramifications of biased evaluations in AI systems, where hidden preference patterns could adversely impact downstream applications such as AI alignment and decision-making processes in critical areas.

Overall, the research highlights the nuance of preference leakage and sets the foundation for future inquiries dedicated to understanding and addressing the biases inherent in the LLM-as-a-judge paradigm.

PDF Markdown

Follow-up Questions

Related Papers

Authors (9)

GitHub

GitHub - David-Li0406/Preference-Leakage (11 stars)

Tweets

https://twitter.com/SCAI_ASU/status/1948480562047361025

https://twitter.com/Dawei_Li_ASU/status/1930375068322091219

https://twitter.com/rohanpaul_ai/status/1891819811241419196

https://twitter.com/arXivGPT/status/1887200768874143775

https://twitter.com/GptMaestro/status/1889007039814512884

https://twitter.com/javaeeeee1/status/1886737409536045176

YouTube

Show All Videos