Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 68 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 34 tok/s Pro

GPT-5 High 31 tok/s Pro

GPT-4o 84 tok/s Pro

Kimi K2 184 tok/s Pro

GPT OSS 120B 441 tok/s Pro

Claude Sonnet 4.5 33 tok/s Pro

2000 character limit reached

Policy Improvement using Language Feedback Models (2402.07876v6)

Published 12 Feb 2024 in cs.LG, cs.AI, and cs.CL

Abstract: We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from LLMs on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance loss, allowing human verification of desirable behaviour for imitation learning.

Citations (4)

View on Semantic Scholar

Summary

The paper introduces a novel approach that integrates LLM-generated feedback into imitation learning, significantly improving task performance across benchmarks.
It outlines a process where environment observations are verbalized and critiqued to refine policy actions, enhancing sample efficiency during training.
Results indicate a 3.5-12.0% performance boost, with LFMs demonstrating strong adaptability and providing interpretable, cost-effective training feedback.

Enhancing Imitation Learning with Language Feedback Models

Overview

Recent advancements in AI policy learning have introduced an innovative approach by integrating LLMs feedback into the training process. This paper explores the concept of Language Feedback Models (LFMs) designed to discern productive behavior for imitation learning in tasks necessitated by instructions. Demonstrating considerable improvements in task completion rates across various language grounding environments, LFMs not only surpass conventional behavioral cloning baselines but also outperform direct action predictions made by LLMs. The method capitalizes on LLMs to create an initial subset of targeted feedback, which informs the training of a compact and efficient LFM. This trained model is subsequently employed to enhance policy performance by identifying and imitating desirable actions. The research contributes significantly to the field by showcasing LFM's adaptability to novel environments, its capacity for on-line policy refinement without additional LLM calls, and its potential in offering interpretable feedback for human analysis.

Methodology

The technique involves several crucial steps, starting with the generation of an initial policy followed by LLM engagement to critique the policy's actions based on their alignment with task objectives. This critique informs the development of a Language Feedback Model that predicts the productivity of actions given specific instructions. This model is then instrumental in identifying beneficial behaviors, which are incorporated back into the policy through imitation learning, thereby enhancing its effectiveness.

Key elements of the approach include:

Verbalization of Observations: Transforming environment observations into language descriptions to leverage LLM world knowledge.
Efficient Learning from LLM Feedback: Distilling LLM feedback into a Language Feedback Model to improve sample efficiency and reduce costs.
Policy Improvement through Imitation: Utilizing the feedback model to discern and imitate productive actions, fostering policy enhancement.

Results and Implications

Researchers reported consistent performance boosts across three distinct language grounding benchmarks: Touchdown, ScienceWorld, and ALFWorld, highlighting LFMs' superiority over traditional behavioral cloning and direct action prediction methods. Notably, LFMs demonstrated robust generalization capabilities, significantly enhancing task completion rates in unfamiliar settings through a single round of adaptation—marking a 3.5-12.0% improvement.

Moreover, LFMs extend beyond policy enhancement, offering a mechanism for generating human-interpretable feedback. Such a feature could revolutionize imitation learning by facilitating human verification of desirable behavior, potentially leading to more trustworthy and comprehensible AI systems.

Speculations on Future Directions

This research paves the way for numerous future inquiries. Exploring LFMs' utility in more complex or dynamically changing environments could unveil further enhancements in AI adaptability and learning efficiency. Moreover, integrating detailed feedback models capable of providing comprehensive critiques presents an exciting frontier. These models not only pinpoint productive actions but also rationalize their productivity, offering a deeper understanding of the behavior-prompting instructions.

Additionally, the applicability of LFMs in real-world scenarios, such as robotics or interactive systems, warrants exploration. In these contexts, the ability to quickly adapt to new tasks and environments while providing human-understandable feedback could significantly impact artificial intelligence's practicality and reliability.

Conclusion

The introduction of Language Feedback Models marks a significant leap forward in the field of instruction-following agents, showcasing the profound impact of leveraging LLMs for feedback generation. By enhancing sample efficiency, reducing reliance on expensive LLM calls, and offering a pathway to interpretable AI actions, LFMs set a new standard in the domain of imitation learning. As this research progresses, it could herald a new era of AI systems capable of more versatile, efficient, and transparent operations, thereby broadening the horizon for AI application across diverse sectors.