Learning with Language-Guided State Abstractions (2402.18759v2)
Abstract: We describe a framework for using natural language to design state abstractions for imitation learning. Generalizable policy learning in high-dimensional observation spaces is facilitated by well-designed state representations, which can surface important features of an environment and hide irrelevant ones. These state representations are typically manually specified, or derived from other labor-intensive labeling procedures. Our method, LGA (language-guided abstraction), uses a combination of natural language supervision and background knowledge from LLMs (LMs) to automatically build state representations tailored to unseen tasks. In LGA, a user first provides a (possibly incomplete) description of a target task in natural language; next, a pre-trained LM translates this task description into a state abstraction function that masks out irrelevant features; finally, an imitation policy is trained using a small number of demonstrations and LGA-generated abstract states. Experiments on simulated robotic tasks show that LGA yields state abstractions similar to those designed by humans, but in a fraction of the time, and that these abstractions improve generalization and robustness in the presence of spurious correlations and ambiguous specifications. We illustrate the utility of the learned abstractions on mobile manipulation tasks with a Spot robot.
- Near optimal behavior via approximate state abstraction. In International Conference on Machine Learning, pp. 2915–2923, 2016.
- State abstractions for lifelong reinforcement learning. In International Conference on Machine Learning, pp. 10–19, 2018.
- Do as I can, not as I say: Grounding language in robotic affordances. Conference on Robot Learning, 2022.
- Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015.
- Feature expansive reward learning: Rethinking human input. In International Conference on Human-Robot Interaction, pp. 216–224, 2021.
- SIRL: similarity-based implicit representation learning. International Conference on Human-Robot Interaction, 2023.
- Aligning robot and human representations. International Conference on Human-Robot Interaction, 2024.
- Designing robot learners that ask good questions. In International conference on Human-Robot Interaction, pp. 17–24, 2012.
- Eager: Asking and answering questions for automatic reward shaping in language-guided rl. Advances in Neural Information Processing Systems, 35:12478–12490, 2022.
- Simplicity: a unifying principle in cognitive science? Trends in cognitive sciences, 7(1):19–22, 2003.
- Guiding policies with language via meta-learning. arXiv preprint arXiv:1811.07882, 2018.
- Learning feature representations with k-means. In Neural Networks: Tricks of the Trade: Second Edition, pp. 561–580. Springer, 2012.
- A survey of demonstration learning. arXiv preprint arXiv:2303.11191, 2023.
- An object-oriented representation for efficient reinforcement learning. In International Conference on Machine Learning, pp. 240–247, 2008.
- Guiding pretraining in reinforcement learning with large language models. In International Conference on Machine Learning, 2023.
- Pragmatic inference and visual abstraction enable contextual flexibility during visual communication. Computational Brain & Behavior, 3:86–101, 2020.
- Using natural language for reward shaping in reinforcement learning. International Joint Conference on Artificial Intelligence, 2019.
- beta-vae: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017.
- People construct simplified mental representations to plan. Nature, 606(7912):129–136, 2022.
- Rational simplification and rigidity in human planning. PsyArXiv, Mar 2023.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. arXiv preprint arXiv:2201.07207, 2022a.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
- Visual explanations prioritize functional properties at the expense of visual fidelity. Cognition, 236:105414, 2023.
- Vima: General robot manipulation with multimodal prompts. International Conference on Machine Learning, 2022.
- Language-driven representation learning for robotics. Robotics: Science and Systems, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Reward design with language models. International Conference on Learning Representations, 2023.
- Dart: Noise injection for robust imitation learning. In Conference on Robot Learning, pp. 143–156, 2017.
- Pebble: Feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. arXiv preprint arXiv:2106.05091, 2021.
- Lampp: Language models as probabilistic priors for perception and action. arXiv e-prints, pp. arXiv–2302, 2023.
- Learning to communicate about shared procedural abstractions. arXiv preprint arXiv:2107.00077, 2021a.
- Connecting perceptual and procedural abstractions in physical construction. In Proceedings of the Annual Meeting of the Cognitive Science Society, volume 43, 2021b.
- Mapping instructions to actions in 3D environments with visual goal prediction. In Conference on Empirical Methods in Natural Language Processing, pp. 2667–2678, 2018.
- Improving intrinsic exploration with language abstractions. Advances in Neural Information Processing Systems, 2022.
- OpenAI. GPT-4 technical report, 2023.
- Mapping language models to grounded conceptual spaces. In International Conference on Learning Representations, 2021.
- Diagnosis, feedback, adaptation: A human-in-the-loop framework for test-time policy adaptation. 2023.
- Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Advances in Neural Information Processing Systems, 1, 1988.
- Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
- Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Conference on Artificial Intelligence and Statistics, pp. 627–635, 2011.
- Skill induction and planning with latent language. In Association for Computational Linguistics, pp. 1713–1726, 2022.
- Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366, 2023.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp. 894–906, 2022.
- Learning rewards from linguistic feedback. In AAAI Conference on Artificial Intelligence, volume 35, pp. 6002–6010, 2021.
- Conjugate markov decision processes. In International Conference on Machine Learning, pp. 137–144, 2011.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Socratic models: Composing zero-shot multimodal reasoning with language. In International Conference on Learning Representations, 2023.
- Detecting twenty-thousand classes using image-level supervision. In European Conference on Computer Vision, pp. 350–368, 2022.
- Andi Peng (17 papers)
- Ilia Sucholutsky (45 papers)
- Belinda Z. Li (21 papers)
- Theodore R. Sumers (16 papers)
- Thomas L. Griffiths (150 papers)
- Jacob Andreas (116 papers)
- Julie A. Shah (20 papers)