Learning to Learn Faster from Human Feedback with Language Model Predictive Control (2402.11450v2)

Published 18 Feb 2024 in cs.RO

Abstract: LLMs have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision process (in which human language inputs are observations, and robot code outputs are actions), then training an LLM to complete previous interactions is training a transition dynamics model -- that can be combined with classic robotics techniques such as model predictive control (MPC) to discover shorter paths to success. This gives rise to LLM Predictive Control (LMPC), a framework that fine-tunes PaLM 2 to improve its teachability on 78 tasks across 5 robot embodiments -- improving non-expert teaching success rates of unseen tasks by 26.9% while reducing the average number of human corrections from 2.4 to 1.9. Experiments show that LMPC also produces strong meta-learners, improving the success rate of in-context learning new tasks on unseen robot embodiments and APIs by 31.5%. See videos, code, and demos at: https://robot-teaching.github.io/.

PDF Abstract

Insights on "Learning to Learn Faster from Human Feedback with LLM Predictive Control"

The paper under discussion presents a sophisticated approach for enhancing the adaptability of LLMs in teaching robots tasks through natural language inputs, spearheaded by the concept of LLM Predictive Control (LMPC). The essence of LMPC lies in leveraging both in-context learning and fine-tuning to improve the teachability of LLMs, thus facilitating a smoother translation of human instructions into robotic actions. This investigation is rooted in the burgeoning intersection between LLMs and robotics, a domain rapidly gaining traction.

At the heart of this endeavor is the recognition of the limitations inherent in in-context learning as typically applied to LLMs. The paper critiques the short memory span of LLMs where feedback remains valid only within the constraints of the model's context window. As a proposed solution, the authors explore fine-tuning techniques to enhance LLMs' memory of in-context interactions. Specifically, they frame human-robot interactions as a partially observable Markov decision process (POMDP), allowing the formulation of a predictive control strategy that seeks efficient paths to success in robot-task completion.

The experimental framework involves fine-tuning Google's PaLM 2 using an LMPC framework across 78 tasks with five different robotic embodiments, including both simulated and real-world environments. This experimental breadth highlights the robustness of the proposed method. The results indicate a significant improvement in teaching success rates by 26.9% on unseen tasks, with an attendant decrease in the number of human corrections from 2.4 to 1.9. These metrics underscore the dual objectives of increasing task success and minimizing corrective actions, ultimately aiming for higher task generalization rates.

Moreover, LMPC's efficacy as a meta-learning tool is illustrated through its ability to improve task success on non-trained robot embodiments and APIs by 31.5%. Such results are buoyed by the integration of top-user conditioning, whereby the LLM learns to emulate the interactions of high-performing users. This approach intriguingly combines user behavior analysis with technical model improvements, offering a holistic view of human-robot interactive learning.

A noteworthy aspect of the research is the bifurcate approach to adaptation: online in-context learning and offline model fine-tuning. This duality allows the model to rapidly adapt during user interactions (fast adaptation) and iteratively improve offline (slow adaptation), akin to a continual learning mechanism. The model also benefits from language-based user feedback, fitting naturally into a framework where human instructions are probabilistically evaluated as part of the POMDP's observation.

This paper's contributions extend beyond practical improvements to include theoretical implications. By utilizing model predictive control principles within LLMs, the paper bridges concepts from control theory and machine learning, particularly applicable to problems involving long-tail distributions of interaction data.

Looking ahead, future explorations could consider expanding the range of feedback modalities, such as visual or auditory inputs, to further enrich the feedback loop in human-robot interaction. Additionally, investigating more efficient fine-tuning methods could help democratize access to these advanced model adaptations by lowering computational barriers.

In conclusion, this paper not only advances the state of the art in using LLMs for robotic task learning but also sets a foundation for future research in accelerating human-to-robot knowledge transfer through enriched language interactions. Its methodology and findings are valuable for researchers in AI and robotics who are intent on refining interactive learning systems where nuanced human feedback plays a pivotal role.