Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" (2312.06571v1)

Published 11 Dec 2023 in cs.RO

Abstract: We report the development of Alter3, a humanoid robot capable of generating spontaneous motion using a LLM, specifically GPT-4. This achievement was realized by integrating GPT-4 into our proprietary android, Alter3, thereby effectively grounding the LLM with Alter's bodily movement. Typically, low-level robot control is hardware-dependent and falls outside the scope of LLM corpora, presenting challenges for direct LLM-based robot control. However, in the case of humanoid robots like Alter3, direct control is feasible by mapping the linguistic expressions of human actions onto the robot's body through program code. Remarkably, this approach enables Alter3 to adopt various poses, such as a 'selfie' stance or 'pretending to be a ghost,' and generate sequences of actions over time without explicit programming for each body part. This demonstrates the robot's zero-shot learning capabilities. Additionally, verbal feedback can adjust poses, obviating the need for fine-tuning. A video of Alter3's generated motions is available at https://tnoinkwms.github.io/ALTER-LLM/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Do as i can, not as i say: Grounding language in robotic affordances, 2022.
  2. N. H. Anderson. Integration theory and attitude change. Psychological Review, 78(3):171–206, 1963.
  3. Baars Bernard. A Cognitive Theory of Consciousness. Cambridge University Press, 1988.
  4. Rt-2: Vision-language-action models transfer web knowledge to robotic control, 2023.
  5. Realtime multi-person 2d pose estimation using part affinity fields. IEEE Conference on CVPR, pages 1302–1310, 2017.
  6. Janez Demsar. Statistical comparisons of classifiers over multiple data sets, 2006.
  7. Task and motion planning with large language models for object rearrangement, 2023.
  8. A new design principle for an autonomous robot. In ECAL 2017, the Fourteenth European Conference on Artificial Life, ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference, pages 490–466, 09 2017.
  9. Palm-e: An embodied multimodal language model, 2023.
  10. Michael Gazzaniga. Social Brain. Basic Books; First Edition, First Printing, 1985.
  11. Visual language maps for robot navigation, 2023.
  12. Can mutual imitation generate open-ended evolution? In the proceedings of Artificial Life 2021 workshop on OEE, 2021.
  13. A. Karmiloff-Smith. Beyond Modularity: A Developmental Perspective on Cognitive Science. Cambridge, MA: MIT Press, 1996.
  14. Daniel Keyes. The Minds of Billy Milligan. Bantam; Reprint edition, 1995.
  15. Code as policies: Language model programs for embodied control, 2023.
  16. Personogenesis through imitating human behavior in a humanoid robot “alter3”. Front. Robot. AI, 7, 2021.
  17. J. McInnes, L.and Healy and J Melville. Umap: uniform manifold approximation and projection for dimension reduction. arXiv preprint, 2018.
  18. AN Meltzoff and MK Moore. Imitation of facial and manual gestures by human neonates. Science, pages 74–8, 1977.
  19. Marvin Minsky. Society Of Mind. Simon and Schuster Touchstone, 1988.
  20. Towards digital nature: Bridging the gap between turing machine objects and linguistic objects in llmms for universal interaction of object-oriented descriptions, 2023.
  21. OpenAI. Gpt-4 technical report, 2023.
  22. Realistic and interactive robot gaze. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11072–11078, 2020.
  23. Generative agents: Interactive simulacra of human behavior, 2023.
  24. Communicative agents for software development, 2023.
  25. John R. Searle. Minds, brains, and programs. Behavioral and Brain Sciences, 3(3):417–424, 1980.
  26. Clip-fields: Weakly supervised semantic fields for robotic memory, 2023.
  27. Prompt, plan, perform: Llm-based humanoid control via quantized imitation learning, 2023.
  28. Saytap: Language to quadrupedal locomotion, 2023.
  29. Chain-of-thought prompting elicits reasoning in large language models, 2023.
  30. Improved trust in human-robot collaboration with chatgpt, 2023.
  31. Development of concept representation of behavior through mimicking and imitation in a humanoid robot alter3. In ALIFE 2023: Ghost in the Machine: Proceedings of the 2023 Artificial Life Conference, page 42, 07 2023.
  32. Language to rewards for robotic skill synthesis, 2023.
  33. Socratic models: Composing zero-shot multimodal reasoning with language. In The Eleventh International Conference on Learning Representations, 2023.
  34. Large language models as zero-shot human models for human-robot interaction, 2023.
  35. Exploring collaboration mechanisms for llm agents: A social psychology view, 2023.
  36. Expel: Llm agents are experiential learners, 2023.
  37. Chatabl: Abductive learning via natural language interaction with chatgpt, 2023.
  38. Mindstorms in natural language-based societies of mind, 2023.
Citations (14)

Summary

  • The paper introduces a novel integration of GPT-4 with Alter3, enabling zero-shot learning to translate text into varied, spontaneous motions.
  • The paper employs a two-phase prompt methodology that first interprets motion intent and then converts it into executable Python code without specific hardware training.
  • The paper demonstrates effective human-robot interaction through dynamic user feedback and multi-agent dialogue, paving the way for adaptable, expressive robotics.

From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"

The paper "From Text to Motion: Grounding GPT-4 in a Humanoid Robot 'Alter3'" presents a significant advancement in the field of humanoid robotics by integrating OpenAI’s GPT-4, a robust LLM, with the humanoid robot Alter3. This research illustrates the application of LLMs outside conventional textual domains, translating linguistic expressions into robotic motions, thereby adding a physical dimension to machine learning.

The researchers developed Alter3, a humanoid robot that generates spontaneous and varied motions using GPT-4, by mapping textual instructions onto the robot’s physical architecture. Notably, this translation requires no explicit action programming for each body part, showcasing Alter3's adaptability through zero-shot learning. The robot can, for example, assume multiple postures such as mimicking taking a selfie or portraying a ghost. This development underscores an intriguing potential for veritable human-robot interactions where complex tasks and interactions can be performed without traditional AI training regimens.

Methodology and Implementation

The pivotal innovation revolves around the synthesis of LLM and robotic architecture. Using a two-phase prompt system grounded in Chain of Thought (CoT) methodology, the LLM interprets textual commands into comprehensive motion plans. Prompt1 first generates detailed descriptions of intended motions, and Prompt2 translates these descriptions into executable Python code for the robot. This procedural framework eliminates the conventional need for hardware-specific training data, a significant step forward in general AI applicability across different platforms.

Evaluation and Results

The paper reports an extensive evaluation process wherein various animations such as 'taking a selfie' and 'pretending to be a ghost' were produced and assessed through third-party evaluations. The generated actions demonstrated acceptability and coherence in human-like expressiveness when viewed by observers, illustrated by substantial user ratings. Notably, the feedback mechanism employed verbal corrections allowing users to fine-tune the robot’s actions dynamically. This stored feedback, enhancing future responses and augmenting the robot’s interaction capability through an evolving memory storage system, reflects a notable degree of artificial adaptability akin to human behavioral adjustment.

Multi-agent Interaction and Dialogue

The project also ventured into simulating social interactions among pseudo-personalities governed by GPT-4 to explore dialogues involving both autonomous dynamics and human participation. Six distinct agents were developed to simulate intrapersonal and interpersonal communication, effectively representing the modularity and nuanced characteristics of human conversational skills. Trajectory analysis of interactions revealed potential pitfalls, such as the so-called 'good-bye attractor', necessitating human intervention for meaningful progression.

Implications and Further Research

This paper exemplifies a burgeoning domain where LLM-driven robotics can transition seamlessly from textual to tangible realms, proffering new horizons for AI applications in real-world environments. The notion of an LLM interfaced with a humanoid form like Alter3 challenges traditional paradigms around grounding symbolic AI within conscious systems. While these engineered articulations impress in their manifest expressiveness, they simultaneously raise philosophical inquiries about the true nature and locus of consciousness.

Future research trajectories may focus on refining multi-agent architectures to avert conversational stagnation and applying these systems to interactive platforms, enhancing their capacity for personalized, social, and empathetic engagements. Furthermore, exploring deeper integrations of real-time sensory feedback for dynamic context awareness could enhance these systems' viability in complex, adaptive human environments, thus broadening the spectrum of human-AI cooperation.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com