Toward Open-ended Embodied Tasks Solving (2312.05822v1)
Abstract: Empowering embodied agents, such as robots, with AI has become increasingly important in recent years. A major challenge is task open-endedness. In practice, robots often need to perform tasks with novel goals that are multifaceted, dynamic, lack a definitive "end-state", and were not encountered during training. To tackle this problem, this paper introduces \textit{Diffusion for Open-ended Goals} (DOG), a novel framework designed to enable embodied AI to plan and act flexibly and dynamically for open-ended task goals. DOG synergizes the generative prowess of diffusion models with state-of-the-art, training-free guidance techniques to adaptively perform online planning and control. Our evaluations demonstrate that DOG can handle various kinds of novel task goals not seen during training, in both maze navigation and robot control problems. Our work sheds light on enhancing embodied AI's adaptability and competency in tackling open-ended goals.
- Is conditional generative modeling all you need for decision making? In The Eleventh International Conference on Learning Representations, 2022.
- Hindsight experience replay. In Advances in Neural Information Processing Systems, volume 30, 2017.
- Model-based offline planning. arXiv preprint arXiv:2008.05556, 2020.
- Human-timescale adaptation in an open-ended task space. In International Conference on Machine Learning, 2023.
- Richard Bellman. A Markovian decision process. Journal of Mathematics and Mechanics, pp. 679–684, 1957.
- Do as i can, not as i say: Grounding language in robotic affordances. In Conference on Robot Learning, pp. 287–318. PMLR, 2023.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- Decision transformer: Reinforcement learning via sequence modeling. In Advances in Neural Information Processing Systems, volume 34, 2021.
- Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687, 2022.
- Learning universal policies via text-guided video generation. arXiv preprint arXiv:2302.00111, 2023.
- Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the International Conference on Machine Learning, pp. 465–472, 2011.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. Advances in Neural Information Processing Systems, 35:18343–18362, 2022.
- D4rl: Datasets for deep data-driven reinforcement learning, 2020.
- Embodied intelligence via learning and evolution. Nature communications, 12(1):5721, 2021.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2019.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Predictive sampling: Real-time behaviour synthesis with mujoco. arXiv preprint arXiv:2212.00541, 2022.
- Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023.
- Offline reinforcement learning as one big sequence modeling problem. In Advances in Neural Information Processing Systems, volume 34, 2021.
- Planning with diffusion for flexible behavior synthesis. In International Conference on Machine Learning, pp. 9902–9915. PMLR, 2022.
- Dongdong Jin and Li Zhang. Embodied intelligence weaves a better future. Nature Machine Intelligence, 2(11):663–664, 2020.
- Morel: Model-based offline reinforcement learning. Advances in neural information processing systems, 33:21810–21823, 2020.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Offline reinforcement learning with implicit q-learning. arXiv preprint arXiv:2110.06169, 2021.
- Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
- Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
- Guided policy search. In International conference on machine learning, pp. 1–9. PMLR, 2013.
- Adaptdiffuser: Diffusion models as adaptive self-evolving planners. arXiv preprint arXiv:2302.01877, 2023.
- Goal-conditioned reinforcement learning: Problems and solutions. arXiv preprint arXiv:2201.08299, 2022.
- Learning latent plans from play. In Conference on robot learning, pp. 1113–1132. PMLR, 2020.
- How far i’ll go: Offline goal-conditioned reinforcement learning via f𝑓fitalic_f-advantage regression. arXiv preprint arXiv:2206.03023, 2022.
- Discovering and achieving goals via world models. Advances in Neural Information Processing Systems, 34:24379–24391, 2021.
- Metaicl: Learning to learn in context. arXiv preprint arXiv:2110.15943, 2021.
- Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
- Planet of the Bayesians: Reconsidering and improving deep planning network by incorporating Bayesian inference. In International Conference on Intelligent Robots and Systems, 2020.
- OpenAI. Introducing chatgpt, 2022. URL https://openai.com/blog/chatgpt/.
- PIPPS: Flexible model-based policy search robust to the curse of chaos. In International Conference on Machine Learning, pp. 4065–4074. PMLR, 2018.
- Imitating human behaviour with diffusion models. arXiv preprint arXiv:2301.10677, 2023.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684–10695, 2022.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241. Springer, 2015.
- Latent plans for task-agnostic offline reinforcement learning. In Conference on Robot Learning, pp. 1838–1849. PMLR, 2023.
- A universal definition of life: autonomy and open-ended evolution. Origins of Life and Evolution of the Biosphere, 34:323–346, 2004.
- Dynamics-aware unsupervised discovery of skills. In International Conference on Learning Representations, 2020.
- Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484, 2016.
- Open-ended learning leads to generally capable agents. arXiv preprint arXiv:2107.12808, 2021.
- Open-ended evolution: Perspectives from the oee workshop in york. Artificial life, 22(3):408–423, 2016.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp. 5026–5033. IEEE, 2012.
- Attention is all you need. In Advances in Neural Information Processing Systems, pp. 5998–6008, 2017.
- Diffusion model-augmented behavioral cloning. arXiv preprint arXiv:2302.13335, 2023.
- Chain of thought imitation with procedure cloning. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Freedom: Training-free energy-guided conditional diffusion model. arXiv preprint arXiv:2303.09833, 2023a.
- Freedom: Training-free energy-guided conditional diffusion model. arXiv preprint arXiv:2303.09833, 2023b.
- Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020.
- Language to rewards for robotic skill synthesis. arXiv preprint arXiv:2306.08647, 2023c.