Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Digital Life Project: Autonomous 3D Characters with Social Intelligence (2312.04547v1)

Published 7 Dec 2023 in cs.CV, cs.AI, cs.GR, and cs.HC

Abstract: In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-the-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, a motion captioning module further allows the virtual character to recognize and appropriately respond to human players' actions. Homepage: https://digital-life-project.com/

Definition Search Book Streamline Icon: https://streamlinehq.com
References (105)
  1. Using large language models to simulate multiple humans and replicate human subject studies. In International Conference on Machine Learning, pages 337–371. PMLR, 2023.
  2. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
  3. Language2pose: Natural language grounded pose forecasting. In 2019 International Conference on 3D Vision (3DV), pages 719–728. IEEE, 2019.
  4. Social penetration: The development of interpersonal relationships. Holt, Rinehart & Winston, 1973.
  5. Out of one, many: Using language models to simulate human samples. Political Analysis, 2023.
  6. The form of the forgetting curve and the fate of memories. Journal of mathematical psychology, 2011.
  7. Social learning theory. Englewood cliffs Prentice Hall, 1977.
  8. The handbook of communication science. Sage, 2010.
  9. The brain’s default network: anatomy, function, and relevance to disease. Annals of the new York Academy of Sciences, 2008.
  10. Smpler-x: Scaling up expressive human pose and shape estimation. arXiv preprint arXiv:2309.17448, 2023.
  11. Executing your commands via motion diffusion in latent space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18000–18010, 2023.
  12. Lin Chin-Yew. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Workshop on Text Summarization Branches Out, 2004, 2004.
  13. Lay dispositionism and implicit theories of personality. Journal of personality and social psychology, 73(1):19, 1997.
  14. Social influence: Compliance and conformity. Annu. Rev. Psychol., 2004.
  15. Simon Clavet. Motion matching and the road to next-gen animation. In Proc. of GDC, 2016.
  16. Sheldon Cohen. Social relationships and health. American psychologist, 2004.
  17. Stress, social support, and the buffering hypothesis. Psychological bulletin, 1985.
  18. The revised neo personality inventory (neo-pi-r). The SAGE handbook of personality theory and assessment, pages 179–198, 2008.
  19. Between facets and domains: 10 aspects of the big five. Journal of personality and social psychology, 2007.
  20. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360, 2021.
  21. Graham A du Plessis and Gideon P de Bruin. Using rasch modelling to examine the international personality item pool (ipip) values in action (via) measure of character strengths. Journal of Psychology in Africa, 2015.
  22. Starkey Duncan Jr. Nonverbal communication. Psychological Bulletin, 1969.
  23. Hermann Ebbinghaus. Memory: A contribution to experimental psychology. Annals of neurosciences, 2013.
  24. Hans Jurgen Eysenck and Sybil Bianca Giuletta Eysenck. Manual of the Eysenck Personality Questionnaire (junior & adult). Hodder and Stoughton Educational, 1975.
  25. S33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT: Social-network simulation system with large language model-empowered agents. arXiv preprint arXiv:2307.14984, 2023.
  26. SOMA: Solving optical marker-based mocap automatically. In Proc. International Conference on Computer Vision (ICCV), pages 11117–11126, 2021.
  27. Synthesis of compositional animations from textual descriptions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1396–1406, 2021.
  28. Generating diverse and natural 3d human motions from text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5152–5161, 2022a.
  29. Tm2t: Stochastic and tokenized modeling for the reciprocal generation of 3d human motions and texts, 2022b.
  30. Edward Twitchell Hall. The hidden dimension. Anchor, 1966.
  31. Edward T Hall. The silent language. Anchor, 1973.
  32. A multiphasic personality schedule (minnesota): I. construction of the schedule. The Journal of Psychology, 1940.
  33. Fritz Heider. The psychology of interpersonal relations. Psychology Press, 2013.
  34. E Tory Higgins. Self-discrepancy: a theory relating self and affect. Psychological review, 1987.
  35. Phase-functioned neural networks for character control. ACM Transactions on Graphics (TOG), 36(4):1–13, 2017.
  36. Learned motion matching. ACM TOG, 39(4):53–1, 2020.
  37. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
  38. William James. The principles of psychology. Cosimo, Inc., 2007.
  39. Motiongpt: Human motion as a foreign language. arXiv preprint arXiv:2306.14795, 2023.
  40. The big-five trait taxonomy: History, measurement, and theoretical perspectives. 1999.
  41. Flame: Free-form language-based motion synthesis & editing. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8255–8263, 2023.
  42. Otter: A multi-modal model with in-context instruction tuning. arXiv preprint arXiv:2305.03726, 2023a.
  43. Hybrik-x: Hybrid analytical-neural inverse kinematics for whole-body mesh recovery. arXiv preprint arXiv:2304.05690, 2023b.
  44. Cliff: Carrying location information in full frames into human pose and shape estimation. In European Conference on Computer Vision, pages 590–606. Springer, 2022.
  45. Intergen: Diffusion-based multi-human motion generation under complex interactions. arXiv preprint arXiv:2304.05684, 2023.
  46. One-stage 3d whole-body mesh recovery with component aware transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21159–21168, 2023.
  47. Roberta: A robustly optimized bert pretraining approach. arXiv:1907.11692, 2019.
  48. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  49. Abraham Harold Maslow. A dynamic theory of human motivation. 1958.
  50. Albert Mehrabian. Basic dimensions for a general psychological theory: Implications for personality, social, environmental, and developmental studies. 1980.
  51. Manual: A guide to the development and use of the myers-briggs type indicator. (No Title), 1985.
  52. Theodore M Newcomb. The prediction of interpersonal attraction. American psychologist, 1956.
  53. Symbolic architectures for cognition. Foundations of cognitive science, 1989.
  54. Toyoaki Nishida. Conversational informatics: An engineering approach. John Wiley & Sons, 2008.
  55. OpenAI. New and improved embedding model, 2022.
  56. OpenAI. Gpt-4 technical report, 2023.
  57. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
  58. Social simulacra: Creating populated prototypes for social computing systems. In UIST, 2022.
  59. Generative agents: Interactive simulacra of human behavior. In In the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23), New York, NY, USA, 2023a. Association for Computing Machinery.
  60. Generative agents: Interactive simulacra of human behavior. In UIST, 2023b.
  61. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  62. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10975–10985, 2019.
  63. Temos: Generating diverse human motions from textual descriptions. In European Conference on Computer Vision, pages 480–497. Springer, 2022.
  64. The kit motion-language dataset. Big data, 4(4):236–252, 2016.
  65. Story-to-motion: Synthesizing infinite and controllable character animation from long text. arXiv preprint arXiv:2311.07446, 2023.
  66. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  67. Trust in close relationships. Journal of personality and social psychology, 1985.
  68. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  69. Julian B Rotter. A new scale for the measurement of interpersonal trust. Journal of personality, 1967.
  70. Personality traits in large language models. arXiv preprint arXiv:2307.00184, 2023.
  71. Cognitive, social, and physiological determinants of emotional state. Psychological review, 1962.
  72. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  73. Chatgpt: Optimizing language models for dialogue. 2022.
  74. Human motion diffusion as a generative prior. arXiv preprint arXiv:2303.01418, 2023.
  75. Conflict-based search for optimal multi-agent pathfinding. AI, 219:40–66, 2015.
  76. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  77. Computerized adaptive assessment of personality disorder: Introducing the cat–pd project. Journal of personality assessment, 2011.
  78. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
  79. Making sense of cronbach’s alpha. International journal of medical education, 2:53, 2011.
  80. Motionclip: Exposing human motion generation to clip space. In European Conference on Computer Vision, pages 358–374. Springer, 2022a.
  81. Human motion diffusion model. In The Eleventh International Conference on Learning Representations, 2022b.
  82. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
  83. What do people think they’re doing? action identification and human behavior. Psychological review, 1987.
  84. Lucas Veber. Auto-rig pro.
  85. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4566–4575, 2015.
  86. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023a.
  87. Zolly: Zoom focal length correctly for perspective-distorted human mesh reconstruction. arXiv preprint arXiv:2303.13796, 2023b.
  88. Fg-t2m: Fine-grained text-driven human motion generation via diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 22035–22044, 2023c.
  89. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
  90. Chain-of-thought prompting elicits reasoning in large language models. NeurIPS, 2022b.
  91. Exploring large language models for communication games: An empirical study on werewolf. arXiv preprint arXiv:2309.04658, 2023.
  92. Multiinstruct: Improving multi-modal zero-shot learning via instruction tuning. arXiv preprint arXiv:2212.10773, 2022.
  93. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023a.
  94. React: Synergizing reasoning and acting in language models. In ICLR, 2023b.
  95. Generating holistic 3d human motion from speech. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 469–480, 2023.
  96. Physdiff: Physics-guided human motion diffusion model. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 16010–16021, 2023.
  97. Building cooperative embodied agents modularly with large language models. arXiv preprint arXiv:2307.02485, 2023a.
  98. Pymaf-x: Towards well-aligned full-body model regression from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023b.
  99. T2m-gpt: Generating human motion from textual descriptions with discrete representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023c.
  100. Motiondiffuse: Text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001, 2022.
  101. Remodiffuse: Retrieval-augmented motion diffusion model. arXiv preprint arXiv:2304.01116, 2023d.
  102. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
  103. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
  104. Minigpt-4: Enhancing vision-language understanding with advanced large language models. arXiv preprint arXiv:2304.10592, 2023a.
  105. Ghost in the minecraft: Generally capable agents for open-world enviroments via large language models with text-based knowledge and memory. arXiv preprint arXiv:2305.17144, 2023b.
Citations (13)

Summary

We haven't generated a summary for this paper yet.