Naturally Occurring Feedback is Common, Extractable and Useful (2407.10944v2)

Published 15 Jul 2024 in cs.CL

Abstract: Human feedback data is a critical component in developing LLMs. However, collecting this feedback is costly and ultimately not scalable. Inspired by the way human interlocutors provide spontaneous unsolicited feedback to each other, we propose to extract feedback that users naturally include when interacting with chat models. We manually annotated conversations to confirm the presence of naturally occurring feedback in a standard corpus, finding that as much as 30% of the chats include explicit feedback. Comparing to older datasets, we find that naturally occurring feedback is more prevalent in recent conversation datasets, suggesting that more than ever, naturally occurring feedback can serve as a valuable resource for feedback data. We propose a method for automatically extracting this feedback, and apply it to over 1M conversations to obtain hundreds of thousands of feedback samples. The extracted feedback shows promise: training with it improves over baseline models and enhances model alignment to human preferences.

Citations (1)

View on Semantic Scholar

Summary

The paper demonstrates that natural feedback from human-model interactions can significantly enhance language model alignment, lessening reliance on costly manual annotations.
It introduces a comprehensive feedback taxonomy and an automated extraction method to categorize feedback from over 334,000 conversations.
Experimental results show that fine-tuning with extracted positive feedback improves performance, with up to 78% of test cases favoring the enhanced model.

Learning from Naturally Occurring Feedback: A Summary

The paper "Learning from Naturally Occurring Feedback" by Shachar Don-Yehiya, Leshem Choshen, and Omri Abend addresses a significant challenge in LLM (LM) training: the cost and scalability constraints of manually collected human feedback. The authors propose leveraging naturally occurring feedback, inherently present during human-model interactions, as a scalable and cost-effective alternative for improving model alignment to human preferences.

Background and Motivation

In contemporary LM training paradigms, the alignment phase often constitutes rigorous processes involving reinforcement learning based on manually annotated human preference data. However, acquiring such data is resource-intensive and not scalable, limiting the potential for continual model improvement. The authors are motivated by empirical evidence suggesting that naturally occurring feedback can provide qualitative advantages, such as reduced hallucinations and biases compared to artificially generated feedback.

Feedback Taxonomy and Manual Annotation

A primary contribution of the paper is the definition of a feedback taxonomy to categorize natural feedback in human-model conversations. The taxonomy covers five categories:

Repeat or Rephrase: The user repeats or rephrases their inquiry.
Make Aware with Correction: The user identifies an error and provides corrective information.
Make Aware without Correction: The user identifies an error without offering corrective advice.
Ask for Clarification: The user requests additional information.
Positive Feedback: The user explicitly appreciates or confirms the model’s accurate response.

To substantiate the prevalence of naturally occurring feedback, manual annotation of conversation data revealed that approximately 30% of conversations contain explicit feedback. This manual effort laid the groundwork for automated extraction techniques.

Automated Feedback Extraction

The authors developed an automated mechanism to extract feedback using LLMs. The method involved:

Constructing a detailed prompt to guide the LLM in recognizing and extracting feedback spans within a conversation.
Parsing the LLM outputs to ensure they align with the defined categories.
Validating the extraction method both quantitatively and qualitatively.

Using this approach, the paper reports the extraction of over 170,000 feedback samples from more than 334,000 conversations, resulting in a substantial dataset for further training purposes.

Experimental Results and Practical Implications

Training on the extracted feedback demonstrated a significant improvement in model alignment. For example, models fine-tuned on around 8,000 positive examples showed an enhanced performance, outperforming their pretrained counterparts in up to 78% of test cases as determined by GPT-4 evaluations. The authors also explored Knowledge Transfer Optimization (KTO) for more nuanced feedback categories, evidencing that even negative feedback enhances model accuracy.

The promising results underscore several implications:

Data Availability and Scalability: Continuous human-model interactive data provides an ever-growing resource, allowing models to evolve with new and better-aligned datasets.
Reduction in Manual Annotation Costs: Naturally occurring feedback mitigates the need for expensive human annotation, proving to be a scalable alternative.
Domain Adaptability: The broader applicability of this method can extend to specific domains, enhancing model performance where domain-specific feedback is abundant.

Potential Future Developments

The paper alludes to several areas for future research, including:

Improvement of Extraction Techniques: Enhanced models and prompts can refine the precision and recall of feedback extraction.
Voice Assistant Feedback: Incorporating multimodal feedback (e.g., voice, gestures) could further enrich the dataset, emulating more natural human interactions.
Real-time Feedback Integration: Exploring interactive reinforcement learning or other real-time methodologies to integrate feedback continuously could revolutionize human-model interactions, making feedback more immediate and beneficial for users.

Conclusion

"Learning from Naturally Occurring Feedback" proposes a novel approach to model training, leveraging organic feedback from human conversations for enhancing LM performance. The paper demonstrates that such naturally occurring feedback is abundant and valuable, offering a practical, scalable, and effective alternative to conventional methods. This paradigm shift towards harnessing implicit human feedback heralds new possibilities for the continuous evolution of conversational artificial intelligence.