From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" (2312.06571v1)
Abstract: We report the development of Alter3, a humanoid robot capable of generating spontaneous motion using a LLM, specifically GPT-4. This achievement was realized by integrating GPT-4 into our proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement. Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code. Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's zero-shot learning capabilities. Additionally, verbal feedback can adjust poses, obviating the need for fine-tuning. A video of Alter3's generated motions is available at https://tnoinkwms.github.io/ALTER-LLM/
- Do as i can, not as i say: Grounding language in robotic affordances, 2022.
- N. H. Anderson. Integration theory and attitude change. Psychological Review, 78(3):171–206, 1963.
- Baars Bernard. A Cognitive Theory of Consciousness. Cambridge University Press, 1988.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023.
- Realtime multi-person 2d pose estimation using part affinity fields. IEEE Conference on CVPR, pages 1302–1310, 2017.
- Janez Demsar. Statistical comparisons of classifiers over multiple data sets, 2006.
- Task and motion planning with large language models for object rearrangement, 2023.
- A new design principle for an autonomous robot. In ECAL 2017, the Fourteenth European Conference on Artificial Life, ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference, pages 490–466, 09 2017.
- Palm-e: An embodied multimodal language model, 2023.
- Michael Gazzaniga. Social Brain. Basic Books; First Edition, First Printing, 1985.
- Visual language maps for robot navigation, 2023.
- Can mutual imitation generate open-ended evolution? In the proceedings of Artificial Life 2021 workshop on OEE, 2021.
- A. Karmiloff-Smith. Beyond Modularity: A Developmental Perspective on Cognitive Science. Cambridge, MA: MIT Press, 1996.
- Daniel Keyes. The Minds of Billy Milligan. Bantam; Reprint edition, 1995.
- Code as policies: Language model programs for embodied control, 2023.
- Personogenesis through imitating human behavior in a humanoid robot “alter3”. Front. Robot. AI, 7, 2021.
- J. McInnes, L.and Healy and J Melville. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint, 2018.
- AN Meltzoff and MK Moore. Imitation of facial and manual gestures by human neonates. Science, pages 74–8, 1977.
- Marvin Minsky. Society Of Mind. Simon and Schuster Touchstone, 1988.
- Towards digital nature: Bridging the gap between turing machine objects and linguistic objects in llmms for universal interaction of object-oriented descriptions, 2023.
- OpenAI. Gpt-4 technical report, 2023.
- Realistic and interactive robot gaze. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11072–11078, 2020.
- Generative agents: Interactive simulacra of human behavior, 2023.
- Communicative agents for software development, 2023.
- John R. Searle. Minds, brains, and programs. Behavioral and Brain Sciences, 3(3):417–424, 1980.
- Clip-fields: Weakly supervised semantic fields for robotic memory, 2023.
- Prompt, plan, perform: Llm-based humanoid control via quantized imitation learning, 2023.
- Saytap: Language to quadrupedal locomotion, 2023.
- Chain-of-thought prompting elicits reasoning in large language models, 2023.
- Improved trust in human-robot collaboration with chatgpt, 2023.
- Development of concept representation of behavior through mimicking and imitation in a humanoid robot alter3. In ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference, page 42, 07 2023.
- Language to rewards for robotic skill synthesis, 2023.
- Socratic models: Composing zero-shot multimodal reasoning with language. In The Eleventh International Conference on Learning Representations, 2023.
- Large language models as zero-shot human models for human-robot interaction, 2023.
- Exploring collaboration mechanisms for llm agents: A social psychology view, 2023.
- Expel: Llm agents are experiential learners, 2023.
- Chatabl: Abductive learning via natural language interaction with chatgpt, 2023.
- Mindstorms in natural language-based societies of mind, 2023.