Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss (2312.16682v2)

Published 27 Dec 2023 in cs.CL and cs.AI

Abstract: Practitioners commonly align LLMs using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been developed for binary feedback, i.e. training models given labels of type response A is good or bad. We show how an existing performant binary feedback method, the Cringe Loss (Adolphs et al., 2022), can be generalized to the pairwise preference setting using a simple soft margin extension. Pairwise Cringe Loss is straightforward to implement and efficient to train, and we find it outperforms state-of-the-art preference optimization algorithms such as PPO and DPO on the AlpacaFarm benchmark. We show that iterations of training of our model are important for improved results, and that we can generalize DPO to Iterative DPO in the same way.

Citations (65)

View on Semantic Scholar

Summary

The paper introduces the Pairwise Cringe Loss as an enhancement over binary feedback to optimize LLM performance using pairwise preference data.
It employs a soft margin based on token-level probabilities, achieving superior content quality and reduced repetition in experiments like AlpacaFarm.
The iterative training approach with a reward model enables continuous data refinement, offering a simple, efficient method for advanced LLM alignment.

Introduction to Pairwise Cringe Loss

The domain of LLM alignment has incorporated various approaches to optimize performance based on different types of feedback data. An established technique for handling binary feedback – discerning good from bad model responses – has been enhanced to accommodate pairwise preferences, where one model response is chosen over another for a given input. This progression is nurtured by the Pairwise Cringe Loss, a method building upon a known binary feedback strategy commonly referred to as the Cringe Loss.

Binary Feedback and Its Extension

Initially, the Cringe Loss method was tailored for binary feedback. This mechanism applies a standard training loss for acceptable examples and a contrasting loss for weaker examples, reducing their likelihood as top-sequence candidates. Iteration further refines model performance by using the model to label new data iteratively. Despite its efficacy with binary feedback, the applicability and prevalence of pairwise preference data for training LLMs necessitate an adaptable method. Consequently, the Pairwise Cringe Loss was developed, implementing a soft margin that activates or deactivates depending on the probability gap between a preferred and a less preferred response generated by the model. This hybrid loss not only works on the level of entire sequences but also considers individual token probabilities.

Experiments and Performance Comparison

Through experiments, the Pairwise Cringe Loss was contrasted with existing standard binary feedback implementations like the original Cringe Loss and others, such as PPO and DPO. It displayed superiority in minimizing repetitions, a trait of LLMs, and demonstrated a higher quality of generated content. When tested on a benchmark known as AlpacaFarm, it excelled in generating model responses that follow given instructions, surpassing several state-of-the-art methods. A pivotal observation is the method's improvement through iterative training. Using a reward model, new responses are generated and assessed to form updated training data, which is then used in subsequent training iterations.

Concluding Remarks

The primary takeaway is that the Pairwise Cringe Loss presents a significant advancement for training instruction-based LLM tasks. This method is not only simple and efficient but exhibits robust performance when benchmarked against leading alternatives. It shows adaptability for potential usage alongside binary feedback, by combining the binary Cringe loss with the Pairwise Cringe loss for diverse data types. The Pairwise Cringe Loss thus stands as a compelling candidate for future LLM training and alignment endeavours.

PDF Markdown

Related Papers

Tweets

https://twitter.com/22146921/status/1740872227464294745

https://twitter.com/permutans/status/1748283164072480794

https://twitter.com/cthorrez/status/1811530003076776242

https://twitter.com/1637708085958696961/status/1741088784043954391