Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 27 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 427 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents (2402.08212v1)

Published 13 Feb 2024 in cs.RO

Abstract: Embodied agents capable of complex physical skills can improve productivity, elevate life quality, and reshape human-machine collaboration. We aim at autonomous training of embodied agents for various tasks involving mainly large foundation models. It is believed that these models could act as a brain for embodied agents; however, existing methods heavily rely on humans for task proposal and scene customization, limiting the learning autonomy, training efficiency, and generalization of the learned policies. In contrast, we introduce a brain-body synchronization ({\it BBSEA}) scheme to promote embodied learning in unknown environments without human involvement. The proposed combines the wisdom of foundation models (brain'') with the physical capabilities of embodied agents (body''). Specifically, it leverages the brain'' to propose learnable physical tasks and success metrics, enabling thebody'' to automatically acquire various skills by continuously interacting with the scene. We carry out an exploration of the proposed autonomous learning scheme in a table-top setting, and we demonstrate that the proposed synchronization can generate diverse tasks and develop multi-task policies with promising adaptability to new tasks and configurations. We will release our data, code, and trained models to facilitate future studies in building autonomously learning agents with large foundation models in more complex scenarios. More visualizations are available at \href{https://bbsea-embodied-ai.github.io}{https://bbsea-embodied-ai.github.io}

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  2. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  3. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  4. Towards a unified agent with foundation models. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023, 2023.
  5. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
  6. Active task randomization: Learning visuomotor skills for sequential manipulation by proposing feasible and novel tasks. arXiv preprint arXiv:2211.06134, 2022.
  7. Rvt: Robotic view transformer for 3d object manipulation. CoRL, 2023.
  8. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE international conference on robotics and automation (ICRA), pp.  3389–3396. IEEE, 2017.
  9. Scaling up and distilling down: Language-guided robot skill acquisition. In Proceedings of the 2023 Conference on Robot Learning, 2023.
  10. Motor learning and development. Human kinetics, 2023.
  11. Temporal difference learning for model predictive control. arXiv preprint arXiv:2203.04955, 2022.
  12. Life span motor development. Human kinetics, 2021.
  13. Visual language maps for robot navigation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  10608–10615. IEEE, 2023a.
  14. Instruct2act: Mapping multi-modality instructions to robotic actions with large language model. arXiv preprint arXiv:2305.11176, 2023b.
  15. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pp.  9118–9147. PMLR, 2022a.
  16. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022b.
  17. Voxposer: Composable 3d value maps for robotic manipulation with language models. arXiv preprint arXiv:2307.05973, 2023c.
  18. BC-Z: zero-shot task generalization with robotic imitation learning. In Faust, A., Hsu, D., and Neumann, G. (eds.), Conference on Robot Learning, 8-11 November 2021, London, UK, volume 164 of Proceedings of Machine Learning Research, pp.  991–1002. PMLR, 2021. URL https://proceedings.mlr.press/v164/jang22a.html.
  19. Vima: General robot manipulation with multimodal prompts. arXiv, 2022.
  20. Jocher, G. YOLOv5 by Ultralytics, May 2020. URL https://github.com/ultralytics/yolov5.
  21. Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  14829–14838, 2022.
  22. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp.  4015–4026, October 2023.
  23. Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  9493–9500. IEEE, 2023.
  24. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153, 2023.
  25. Reflect: Summarizing robot experiences for failure explanation and correction. In 7th Annual Conference on Robot Learning, 2023.
  26. Grounding object relations in language-conditioned robotic manipulation with semantic-spatial reasoning. arXiv preprint arXiv:2303.17919, 2023.
  27. Language conditioned imitation learning over unstructured data. In Shell, D. A., Toussaint, M., and Hsieh, M. A. (eds.), Robotics: Science and Systems XVII, Virtual Event, July 12-16, 2021, 2021. doi: 10.15607/RSS.2021.XVII.047. URL https://doi.org/10.15607/RSS.2021.XVII.047.
  28. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023.
  29. What matters in language conditioned robotic imitation learning over unstructured data. IEEE Robotics and Automation Letters, 7(4):11205–11212, 2022. doi: 10.1109/LRA.2022.3196123.
  30. OpenAI. Gpt-4 technical report. ArXiv, abs/2303.08774, 2023. URL https://api.semanticscholar.org/CorpusID:257532815.
  31. Pomerleau, D. A. Alvinn: An autonomous land vehicle in a neural network. Advances in neural information processing systems, 1, 1988.
  32. Improving language understanding by generative pre-training. 2018.
  33. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
  34. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp.  8748–8763. PMLR, 2021.
  35. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  36. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. In Conference on Robot Learning, pp.  492–504. PMLR, 2023.
  37. Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp.  894–906. PMLR, 2022a.
  38. Perceiver-actor: A multi-task transformer for robotic manipulation. In Proceedings of the 6th Conference on Robot Learning (CoRL), 2022b.
  39. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp.  11523–11530. IEEE, 2023.
  40. Llm-planner: Few-shot grounded planning for embodied agents with large language models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp.  2998–3009, 2023.
  41. Reinforcement learning: An introduction. 2018.
  42. Large language models as generalizable policies for embodied tasks. arXiv preprint arXiv:2310.17722, 2023.
  43. Saytap: Language to quadrupedal locomotion. arXiv preprint arXiv:2306.07580, 2023.
  44. Gensim: Generating robotic simulation tasks via large language models. arXiv preprint arXiv:2310.01361, 2023.
  45. Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
  46. Text2reward: Automated dense reward function generation for reinforcement learning. arXiv preprint arXiv:2309.11489, 2023.
  47. Set-of-mark prompting unleashes extraordinary visual grounding in gpt-4v. arXiv preprint arXiv:2310.11441, 2023a.
  48. The dawn of lmms: Preliminary explorations with gpt-4v (ision). arXiv preprint arXiv:2309.17421, 2023b.
  49. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=GY6-6sTvGaf.
  50. Language to rewards for robotic skill synthesis. Arxiv preprint arXiv:2306.08647, 2023.
  51. Socratic models: Composing zero-shot multimodal reasoning with language. arXiv preprint arXiv:2204.00598, 2022.
  52. A framework for efficient robotic manipulation. arXiv preprint arXiv:2012.07975, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.