Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 58 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 119 tok/s Pro
Kimi K2 188 tok/s Pro
GPT OSS 120B 460 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment (2403.11075v2)

Published 17 Mar 2024 in cs.HC, cs.AI, and cs.MA

Abstract: Verbal communication plays a crucial role in human cooperation, particularly when the partners only have incomplete information about the task, environment, and each other's mental state. In this paper, we propose a novel cooperative communication framework, Goal-Oriented Mental Alignment (GOMA). GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states that are relevant to the goals. This approach enables an embodied assistant to reason about when and how to proactively initialize communication with humans verbally using natural language to help achieve better cooperation. We evaluate our approach against strong baselines in two challenging environments, Overcooked (a multiplayer game) and VirtualHome (a household simulator). Our experimental results demonstrate that LLMs struggle with generating meaningful communication that is grounded in the social and physical context. In contrast, our approach can successfully generate concise verbal communication for the embodied assistant to effectively boost the performance of the cooperation as well as human users' perception of the assistant.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. L. Ying, T. Zhi-Xuan, V. Mansinghka, and J. B. Tenenbaum, “Inferring the goals of communicating agents from actions and instructions,” in Proceedings of the AAAI Symposium Series, vol. 2, no. 1, 2023, pp. 26–33.
  2. D. Hadfield-Menell, S. J. Russell, P. Abbeel, and A. Dragan, “Cooperative inverse reinforcement learning,” in Advances in Neural Information Processing Systems, 2016, pp. 3909–3917.
  3. X. Gao, R. Gong, Y. Zhao, S. Wang, T. Shu, and S.-C. Zhu, “Joint mind modeling for explanation generation in complex human-robot collaborative tasks,” in 2020 29th IEEE international conference on robot and human interactive communication (RO-MAN).   IEEE, 2020, pp. 1119–1126.
  4. H. Zhang, W. Du, J. Shan, Q. Zhou, Y. Du, J. B. Tenenbaum, T. Shu, and C. Gan, “Building cooperative embodied agents modularly with large language models,” arXiv preprint arXiv:2307.02485, 2023.
  5. A. Hong, N. Lunscher, T. Hu, Y. Tsuboi, X. Zhang, S. F. dos Reis Alves, G. Nejat, and B. Benhabib, “A multimodal emotional human–robot interaction architecture for social robots engaged in bidirectional communication,” IEEE transactions on cybernetics, vol. 51, no. 12, pp. 5954–5968, 2020.
  6. M. Kleiman-Weiner, M. K. Ho, J. L. Austerweil, M. L. Littman, and J. B. Tenenbaum, “Coordinate to cooperate or compete: abstract goals and joint intentions in social interaction,” in COGSCI, 2016.
  7. S. A. Wu, R. E. Wang, J. A. Evans, J. B. Tenenbaum, D. C. Parkes, and M. Kleiman-Weiner, “Too many cooks: Bayesian inference for coordinating multi-agent collaboration,” Topics in Cognitive Science, vol. 13, no. 2, pp. 414–432, 2021.
  8. J. MacMillan, E. E. Entin, and D. Serfaty, “Communication overhead: The hidden cost of team cognition.” 2004.
  9. E. Horvitz and J. Apacible, “Learning and reasoning about interruption,” in Proceedings of the 5th international conference on Multimodal interfaces, 2003, pp. 20–27.
  10. V. V. Unhelkar, S. Li, and J. A. Shah, “Decision-making for bidirectional communication in sequential human-robot collaborative tasks,” in Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, 2020, pp. 329–341.
  11. E. C. Williams, N. Gopalan, M. Rhee, and S. Tellex, “Learning to parse natural language to grounded reward functions with weak supervision,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 4430–4436.
  12. T. Zhi-Xuan, L. Ying, V. Mansinghka, and J. B. Tenenbaum, “Pragmatic instruction following and goal assistance via cooperative language-guided inverse planning,” arXiv preprint arXiv:2402.17930, 2024.
  13. L. Yuan, X. Gao, Z. Zheng, M. Edmonds, Y. N. Wu, F. Rossano, H. Lu, Y. Zhu, and S.-C. Zhu, “In situ bidirectional human-robot value alignment,” Science robotics, vol. 7, no. 68, p. eabm4183, 2022.
  14. C. Zhang, J. Chen, J. Li, Y. Peng, and Z. Mao, “Large language models for human-robot interaction: A review,” Biomimetic Intelligence and Robotics, p. 100131, 2023.
  15. B. Ichter, A. Brohan, Y. Chebotar, C. Finn, K. Hausman, . …, and C. Kelly, “Do as i can, not as i say: Grounding language in robotic affordances,” in Proceedings of The 6th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, K. Liu, D. Kulic, and J. Ichnowski, Eds., vol. 205.   PMLR, 14–18 Dec 2023, pp. 287–318.
  16. Z. Mandi, S. Jain, and S. Song, “Roco: Dialectic multi-robot collaboration with large language models,” arXiv preprint arXiv:2307.04738, 2023.
  17. S. Devin and R. Alami, “An implemented theory of mind to improve human-robot shared plans execution,” in 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI).   IEEE, 2016, pp. 319–326.
  18. K. E. Schaefer, E. R. Straub, J. Y. Chen, J. Putney, and A. W. Evans III, “Communicating intent to develop shared situation awareness and engender trust in human-agent teams,” Cognitive Systems Research, vol. 46, pp. 26–39, 2017.
  19. T. Ullman, “Large language models fail on trivial alterations to theory-of-mind tasks,” arXiv preprint arXiv:2302.08399, 2023.
  20. C. L. Baker, J. Jara-Ettinger, R. Saxe, and J. B. Tenenbaum, “Rational quantitative attribution of beliefs, desires and percepts in human mentalizing,” Nature Human Behaviour, vol. 1, no. 4, pp. 1–10, 2017.
  21. T. Zhi-Xuan, J. Mann, T. Silver, J. Tenenbaum, and V. Mansinghka, “Online bayesian goal inference for boundedly rational planning agents,” Advances in Neural Information Processing Systems, vol. 33, 2020.
  22. T. Shu, A. Bhandwaldar, C. Gan, K. Smith, S. Liu, D. Gutfreund, E. Spelke, J. Tenenbaum, and T. Ullman, “Agent: A benchmark for core psychological reasoning,” in International conference on machine learning.   PMLR, 2021, pp. 9614–9625.
  23. C. Jin, Y. Wu, J. Cao, J. Xiang, Y.-L. Kuo, Z. Hu, T. Ullman, A. Torralba, J. B. Tenenbaum, and T. Shu, “Mmtom-qa: Multimodal theory of mind question answering,” arXiv preprint arXiv:2401.08743, 2024.
  24. L. Ying, K. M. Collins, M. Wei, C. E. Zhang, T. Zhi-Xuan, A. Weller, J. B. Tenenbaum, and L. Wong, “The neuro-symbolic inverse planning engine (nipe): Modeling probabilistic social inferences from linguistic inputs,” arXiv preprint arXiv:2306.14325, 2023.
  25. L. Ying, T. Zhi-Xuan, L. Wong, V. Mansinghka, and J. Tenenbaum, “Grounding language about belief in a bayesian theory-of-mind,” arXiv preprint arXiv:2402.10416, 2024.
  26. A. Dragan and S. Srinivasa, “Generating legible motion,” 2013.
  27. F. Stulp, J. Grizou, B. Busch, and M. Lopes, “Facilitating intention prediction for humans by optimizing robot motions,” in 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2015, pp. 1249–1255.
  28. M. Kwon, S. H. Huang, and A. D. Dragan, “Expressing robot incapability,” in Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, 2018, pp. 87–95.
  29. Y. Zhang, S. Sreedharan, A. Kulkarni, T. Chakraborti, H. H. Zhuo, and S. Kambhampati, “Plan explicability and predictability for robot task planning,” in 2017 IEEE international conference on robotics and automation (ICRA).   IEEE, 2017, pp. 1313–1320.
  30. X. Gao, L. Yuan, T. Shu, H. Lu, and S.-C. Zhu, “Show me what you can do: Capability calibration on reachable workspace for human-robot collaboration,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2644–2651, 2022.
  31. P. J. Gmytrasiewicz and P. Doshi, “A framework for sequential planning in multi-agent settings,” Journal of Artificial Intelligence Research, vol. 24, pp. 49–79, 2005.
  32. P. Doshi and P. J. Gmytrasiewicz, “Monte Carlo sampling methods for approximating interactive POMDPs,” Journal of Artificial Intelligence Research, vol. 34, pp. 297–337, 2009.
  33. L. P. Kaelbling, M. L. Littman, and A. R. Cassandra, “Planning and acting in partially observable stochastic domains,” Artificial intelligence, vol. 101, no. 1-2, pp. 99–134, 1998.
  34. A. Bosch-Domenech, J. G. Montalvo, R. Nagel, and A. Satorra, “One, two,(three), infinity,…: Newspaper and lab beauty-contest experiments,” American Economic Review, vol. 92, no. 5, pp. 1687–1701, 2002.
  35. OpenAI, “Gpt-4 technical report,” 2023.
  36. M. Carroll, R. Shah, M. K. Ho, T. Griffiths, S. Seshia, P. Abbeel, and A. Dragan, “On the utility of learning about humans for human-ai coordination,” Advances in neural information processing systems, vol. 32, 2019.
  37. X. Puig, T. Shu, S. Li, Z. Wang, Y.-H. Liao, J. B. Tenenbaum, S. Fidler, and A. Torralba, “Watch-and-help: A challenge for social perception and human-ai collaboration,” arXiv preprint arXiv:2010.09890, 2020.
Citations (7)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 8 likes.