Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pre-Trained Language Models for Interactive Decision-Making (2202.01771v4)

Published 3 Feb 2022 in cs.LG and cs.CL

Abstract: LLM (LM) pre-training is useful in many language processing tasks. But can pre-trained LMs be further leveraged for more general machine learning problems? We propose an approach for using LMs to scaffold learning and generalization in general sequential decision-making problems. In this approach, goals and observations are represented as a sequence of embeddings, and a policy network initialized with a pre-trained LM predicts the next action. We demonstrate that this framework enables effective combinatorial generalization across different environments and supervisory modalities. We begin by assuming access to a set of expert demonstrations, and show that initializing policies with LMs and fine-tuning them via behavior cloning improves task completion rates by 43.6% in the VirtualHome environment. Next, we integrate an active data gathering procedure in which agents iteratively interact with the environment, relabel past "failed" experiences with new goals, and update their policies in a self-supervised loop. Active data gathering further improves combinatorial generalization, outperforming the best baseline by 25.1%. Finally, we explain these results by investigating three possible factors underlying the effectiveness of the LM-based policy. We find that sequential input representations (vs. fixed-dimensional feature vectors) and LM-based weight initialization are both important for generalization. Surprisingly, however, the format of the policy inputs encoding (e.g. as a natural language string vs. an arbitrary sequential encoding) has little influence. Together, these results suggest that LLMing induces representations that are useful for modeling not just language, but also goals and plans; these representations can aid learning and generalization even outside of language processing.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (14)
  1. Shuang Li (203 papers)
  2. Xavier Puig (14 papers)
  3. Chris Paxton (59 papers)
  4. Yilun Du (113 papers)
  5. Clinton Wang (4 papers)
  6. Linxi Fan (33 papers)
  7. Tao Chen (397 papers)
  8. De-An Huang (45 papers)
  9. Ekin Akyürek (25 papers)
  10. Anima Anandkumar (236 papers)
  11. Jacob Andreas (116 papers)
  12. Igor Mordatch (66 papers)
  13. Antonio Torralba (178 papers)
  14. Yuke Zhu (134 papers)
Citations (219)

Summary

Pre-Trained LLMs for Interactive Decision-Making: A Summary

The presented paper explores the application of pre-trained LLMs (LMs) beyond traditional natural language processing tasks, extending their utility to general sequential decision-making environments, with a particular focus on embodied decision-making. The authors propose a framework where goals, observations, and actions in various environments are encoded as sequences and then processed by policy networks initialized with pre-trained LMs. This strategy aims to harness the combinatorial generalization capabilities of LMs, even in non-linguistic tasks.

Methodology

The authors' approach begins by using expert demonstrations to guide the training process. A policy network is initialized using parameters from a pre-trained LM, such as GPT-2, which is then fine-tuned on demonstration data employing behavior cloning. This initialization step leverages the inductive biases instilled during the LLM pre-training phase.

Following the imitation learning phase, an active data gathering (ADG) method is introduced. This entails iteratively interacting with environments to collect new trajectory data, relabeling "failed" experiences to generate relevant sub-goals, and iterating over policy updates within a self-supervised framework. The ADG loop is designed to improve the distributional generalization of policies, offering enhancements over pertinent baselines by a significant margin of 25.1%.

Results and Evaluation

Extensive evaluation is conducted in two complex environments: VirtualHome—a 3D realistic simulator with partial observability and extensive action spaces—and BabyAI—a 2D grid world aligned with instruction following. In these settings, adopting LMs significantly improves task completion in both seen and novel scenarios. For example, in VirtualHome, LM-initialized policies achieve a 43.6% boost in task success for novel task configurations when compared to traditional baselines. Similarly, improvements are noted across a range of evaluation metrics, especially in scenarios requiring out-of-distribution generalization—central to developing versatile AI agents.

Analysis

The paper also explores why and how the proposed framework promotes generalization:

  1. Input Encoding Flexibility: By transforming policy inputs into sequences (not necessarily language strings), LMs' capacity for handling diverse data representations is leveraged. This is requisite for tasks where understanding isn't bound to verbal semantics.
  2. Transformer Architecture Utilization: The sequential processing structure inherent to transformers outperforms fixed-dimensional observation approaches, reinforcing the importance of LM architecture in achieving combinatorial generalization.
  3. Weight Initialization from Pre-training: Initializing policies with weights from an LM pre-training phase offers intrinsic inductive biases conducive to sequence handling, a contributor to improved performance across varied input forms.

The conclusions drawn recognize the broader applicability of LMs in structured decision-making tasks, emphasizing their utility as a foundational model for diverse AI applications that demand generalization across non-trivial task variations. This research posits pre-trained LMs as robust policy initializers, offering potential pathways for future exploration in machine learning domains that transcend traditional language tasks, encompassing vision, robotics, and autonomous systems. Future work is needed to further assess biases in model outputs, ensure equitable decision-making, and refine active data gathering methodologies that rely less on hand-crafted rules.