Building Open-Ended Embodied Agent via Language-Policy Bidirectional Adaptation (2401.00006v3)
Abstract: Building embodied agents on integrating LLMs and Reinforcement Learning (RL) have revolutionized human-AI interaction: researchers can now leverage language instructions to plan decision-making for open-ended tasks. However, existing research faces challenges in meeting the requirement of open-endedness. They typically either train LLM/RL models to adapt to a fixed counterpart, limiting exploration of novel skills and hindering the efficacy of human-AI interaction. To this end, we present OpenPAL, a co-training framework comprising two stages: (1) fine-tuning a pre-trained LLM to translate human instructions into goals for planning, and goal-conditioned training a policy for decision-making; (2) co-training to align the LLM and policy, achieving instruction open-endedness. We conducted experiments using Contra, an open-ended FPS game, demonstrating that an agent trained with OpenPAL not only comprehends arbitrary instructions but also exhibits efficient execution. These results suggest that OpenPAL holds the potential to construct open-ended embodied agents in practical scenarios.
- Hindsight experience replay. Advances in neural information processing systems, 30, 2017.
- Open-ended learning in symmetric zero-sum games. In International Conference on Machine Learning, pp. 434–443. PMLR, 2019.
- Dota 2 with large scale deep reinforcement learning, 2019.
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
- Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pp. 287–318. PMLR, 2023.
- Language models are few-shot learners, 2020.
- Learning with amigo: Adversarially motivated intrinsic goals. In International Conference on Learning Representations, 2020.
- On the utility of learning about humans for human-ai coordination, 2020.
- Generating diverse cooperative agents by learning incompatible policies. In ICML 2022 Workshop AI for Agent-Based Modelling, 2022. URL https://openreview.net/forum?id=a7vLnGKGIjY.
- Battle royale game: In search of a new game genre. IJCT, pp. 5.
- Adversarial diversity in hanabi. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=uLE3WF3-H_5.
- Emergent complexity and zero-shot transfer via unsupervised environment design. Advances in neural information processing systems, 33:13049–13061, 2020.
- Magnetic field-based reward shaping for goal-conditioned reinforcement learning. IEEE/CAA Journal of Automatica Sinica, 10(12):1–15, 2023.
- Guiding pretraining in reinforcement learning with large language models. arXiv preprint arXiv:2302.06692, 2023.
- Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 320–335, 2022.
- First return, then explore. Nature, 590(7847):580–586, 2021.
- Curriculum-guided hindsight experience replay. Advances in neural information processing systems, 32, 2019.
- Automatic goal generation for reinforcement learning agents. In International conference on machine learning, pp. 1515–1528. PMLR, 2018.
- Battle royale: First-person shooter game. In Proceedings of the International Conference on Innovative Computing & Communication (ICICC), 2021.
- Goal-conditioned end-to-end visuomotor control for versatile skill primitives. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 1319–1325. IEEE, 2021.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- Enabling intelligent interactions between an agent and an llm: A reinforcement learning approach, 2023.
- Lora: Low-rank adaptation of large language models, 2021.
- Human-level performance in 3d multiplayer games with population-based reinforcement learning. Science, 364(6443):859–865, 2019.
- Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022.
- Language conditioned imitation learning over unstructured data. arXiv preprint arXiv:2005.07648, 2020.
- Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
- Overcoming exploration in reinforcement learning with demonstrations. In 2018 IEEE international conference on robotics and automation (ICRA), pp. 6292–6299. IEEE, 2018.
- Hierarchical foresight: Self-supervised learning of long-horizon tasks via visual subgoal generation. In International Conference on Learning Representations, 2019.
- Curriculum learning for reinforcement learning domains: A framework and survey. The Journal of Machine Learning Research, 21(1):7382–7431, 2020.
- Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp. 278–287. Citeseer, 1999.
- Human-robot cross-training: Computational formulation, modeling and evaluation of a human team training strategy. In 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 33–40, 2013. doi: 10.1109/HRI.2013.6483499.
- OpenAI. Gpt-4 technical report, 2023.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Parker-Holder, J. Towards truly open-ended reinforcement learning. PhD thesis, University of Oxford, 2022.
- Maximum entropy gain exploration for long horizon multi-goal reinforcement learning. In International Conference on Machine Learning, pp. 7750–7761. PMLR, 2020.
- Exploration via hindsight goal generation. Advances in Neural Information Processing Systems, 32, 2019.
- Planning for autonomous cars that leverage effects on human actions. In Robotics: Science and systems, volume 2, pp. 1–9. Ann Arbor, MI, USA, 2016.
- Maestro: Open-ended environment design for multi-agent reinforcement learning. In The Eleventh International Conference on Learning Representations, 2022.
- Universal value function approximators. In International conference on machine learning, pp. 1312–1320. PMLR, 2015a.
- Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015b.
- Proximal policy optimization algorithms, 2017.
- Open-endedness: The last grand challenge you’ve never heard of. While open-endedness could be a force for discovering intelligence, it could also be a component of AI itself, 2017.
- Collaborating with humans without human data, 2022.
- Reinforcement learning: An introduction. MIT press, 2018.
- On the utility of model learning in hri, 2020.
- Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
- Llama 2: Open foundation and fine-tuned chat models, 2023.
- Keeping your distance: Solving sparse reward tasks using self-balancing shaped rewards, 2019.
- Grandmaster level in starcraft ii using multi-agent reinforcement learning. Nature, 575(7782):350–354, 2019.
- The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168–173, 1974.
- Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.
- Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions. arXiv preprint arXiv:1901.01753, 2019.
- Unsupervised control through non-parametric discriminative rewards. In International Conference on Learning Representations, 2018.
- Weng, L. Curriculum for reinforcement learning. lilianweng.github.io, Jan 2020. URL https://lilianweng.github.io/posts/2020-01-29-curriculum-rl/.
- GLM-130b: An open bilingual pre-trained model. In The Eleventh International Conference on Learning Representations (ICLR), 2023. URL https://openreview.net/forum?id=-Aw0rrrPUF.
- Online decision transformer. In international conference on machine learning, pp. 27042–27059. PMLR, 2022.
- Shaopeng Zhai (3 papers)
- Jie Wang (480 papers)
- Tianyi Zhang (262 papers)
- Fuxian Huang (4 papers)
- Qi Zhang (785 papers)
- Ming Zhou (182 papers)
- Jing Hou (7 papers)
- Yu Liu (786 papers)
- Yu Qiao (563 papers)