Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement (2402.11436v2)

Published 18 Feb 2024 in cs.CL and cs.AI

Abstract: Recent studies show that LLMs improve their performance through self-feedback on certain tasks while degrade on others. We discovered that such a contrary is due to LLM's bias in evaluating their own output. In this paper, we formally define LLM's self-bias - the tendency to favor its own generation - using two statistics. We analyze six LLMs (GPT-4, GPT-3.5, Gemini, LLaMA2, Mixtral and DeepSeek) on translation, constrained text generation, and mathematical reasoning tasks. We find that self-bias is prevalent in all examined LLMs across multiple languages and tasks. Our analysis reveals that while the self-refine pipeline improves the fluency and understandability of model outputs, it further amplifies self-bias. To mitigate such biases, we discover that larger model size and external feedback with accurate assessment can significantly reduce bias in the self-refine pipeline, leading to actual performance improvement in downstream tasks. The code and data are released at https://github.com/xu1998hz/LLM_self_bias.

PDF Abstract

Unveiling Self-Bias in LLMs Across Diverse Tasks

Introduction to Self-Bias in LLMs

In the evolving landscape of LLMs, the phenomenon of self-bias — where models exhibit a preference for their own generations — presents a nuanced challenge. The paper under discussion explores this issue, presenting a comprehensive analysis of self-bias across six diverse LLMs engaged in tasks such as translation, constrained text generation, and mathematical reasoning. This exploration uncovers the universal presence of self-bias, emphasizing its implications on model performance and output quality.

Quantification of Self-Bias

The paper introduces a novel approach to quantify self-bias in LLMs, employing two principal statistics: bias estimation and distance skewness. These metrics illuminate the discrepancy between LLM's self-evaluation and actual performance, revealing a consistent amplification of self-bias across multiple iterations of self-refinement. The findings suggest that, despite improvements in fluency and understandability, self-refinement does not necessarily lead to desired outcomes, such as enhanced quality or broader concept coverage.

Analysis Across Tasks

Translation

Investigations into translation tasks reveal that self-bias not only persists but also intensifies with iterative self-refinement. Notably, open-source LLMs and certain versions of commercially available models display higher self-bias levels. This amplification suggests a misalignment between perceived and actual performance improvements, with models favoring their generative style over substantive quality enhancements.

Constrained Text Generation

For constrained text generation, the paper highlights a similar trend of escalating self-bias. The analysis indicates that models may optimize for false positives — improvements that are not genuinely beneficial — leading to a cycle of unproductive optimization and reduced diversity in text generation.

Mathematical Reasoning

In tasks involving mathematical reasoning, the presence of self-bias underscores the challenges LLMs face in self-correction. Despite engaging in iterative refinement, models tend to favor certain reasoning paths, which may not lead to correct solutions, further evidencing the pervasive nature of self-bias across different domains.

Addressing Self-Bias

To mitigate self-bias, the paper proposes two primary interventions: increasing the model size and integrating external feedback. Larger models demonstrate reduced self-bias, possibly due to their enhanced evaluative and corrective capacities. Moreover, external feedback, characterized by accurate assessment, significantly diminishes bias, guiding models towards more accurate self-corrections and genuine performance improvements.

Theoretical and Practical Implications

The research provides a foundational perspective on the mechanisms of self-bias in LLMs, contributing to our understanding of model behaviors in self-refinement and self-rewarding pipelines. Practically, the findings emphasize the need for incorporating mechanisms — such as external feedback and adjusting model sizes — to counterbalance self-bias and enhance the reliability of LLMs across tasks.

Speculating on Future Developments

Looking forward, the paper speculates on the evolution of methodologies to detect, quantify, and mitigate self-bias in LLMs. It calls for further exploration into the dynamics of self-bias across different model architectures, tasks, and languages, underscoring the importance of developing more nuanced and effective strategies to ensure the integrity and applicability of LLMs in diverse real-world scenarios.

Conclusion

The exploration of self-bias in LLMs highlights a critical challenge in the field of AI and machine learning. By systematically analyzing and addressing this issue, the research contributes valuable insights towards the development of more robust, accurate, and unbiased LLMs, paving the way for advancements that align closely with human evaluative standards and expectations.