Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Pragmatic Instruction Following and Goal Assistance via Cooperative Language-Guided Inverse Planning (2402.17930v1)

Published 27 Feb 2024 in cs.AI, cs.CL, and cs.LG

Abstract: People often give instructions whose meaning is ambiguous without further context, expecting that their actions or goals will disambiguate their intentions. How can we build assistive agents that follow such instructions in a flexible, context-sensitive manner? This paper introduces cooperative language-guided inverse plan search (CLIPS), a Bayesian agent architecture for pragmatic instruction following and goal assistance. Our agent assists a human by modeling them as a cooperative planner who communicates joint plans to the assistant, then performs multimodal Bayesian inference over the human's goal from actions and language, using LLMs to evaluate the likelihood of an instruction given a hypothesized plan. Given this posterior, our assistant acts to minimize expected goal achievement cost, enabling it to pragmatically follow ambiguous instructions and provide effective assistance even when uncertain about the goal. We evaluate these capabilities in two cooperative planning domains (Doors, Keys & Gems and VirtualHome), finding that CLIPS significantly outperforms GPT-4V, LLM-based literal instruction following and unimodal inverse planning in both accuracy and helpfulness, while closely matching the inferences and assistive judgments provided by human raters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774 (2023).
  2. Do as I can, not as I say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 (2022).
  3. Modeling the mistakes of boundedly rational agents within a Bayesian theory of mind. arXiv preprint arXiv:2106.13249 (2021).
  4. John Langshaw Austin. 1975. How to do things with words. Vol. 88. Oxford University Press.
  5. Pruning and preprocessing methods for inventory-aware pathfinding. In 2016 IEEE Conference on Computational Intelligence and Games (CIG). IEEE, 1–8.
  6. Bayesian Theory of Mind: Modeling Joint Belief-Desire Attribution. In Proceedings of the Annual Meeting of the Cognitive Science Society, 33 (33).
  7. Rational quantitative attribution of beliefs, desires and percepts in human mentalizing. Nature Human Behaviour 1, 4 (2017), 1–10.
  8. Action understanding as inverse planning. Cognition 113, 3 (2009), 329–349.
  9. Learning to act using real-time dynamic programming. Artificial intelligence 72, 1-2 (1995), 81–138.
  10. Craig Boutilier. 1996. Planning, learning and coordination in multiagent decision processes. In TARK, Vol. 96. Citeseer, 195–210.
  11. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  12. Visual cognition in multimodal large language models. arXiv:2311.16093 [cs.LG]
  13. A cognitive hierarchy model of games. The Quarterly Journal of Economics 119, 3 (2004), 861–898.
  14. Jaedeug Choi and Kee-Eung Kim. 2014. Hierarchical bayesian inverse reinforcement learning. IEEE transactions on cybernetics 45, 4 (2014), 793–805.
  15. Gen: a general-purpose probabilistic programming system with programmable inference. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 221–236.
  16. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society Series B: Statistical Methodology 68, 3 (2006), 411–436.
  17. Prashant Doshi and Piotr J Gmytrasiewicz. 2009. Monte Carlo sampling methods for approximating interactive POMDPs. Journal of Artificial Intelligence Research 34 (2009), 297–337.
  18. Pragmatic-Pedagogic Value Alignment. arXiv preprint arXiv:1707.06354 (2017).
  19. Unified Pragmatic Models for Generating and Following Instructions. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), Marilyn Walker, Heng Ji, and Amanda Stent (Eds.). Association for Computational Linguistics, New Orleans, Louisiana, 1951–1963. https://doi.org/10.18653/v1/N18-1177
  20. From language to goals: Inverse reinforcement learning for vision-based instruction following. arXiv preprint arXiv:1902.07742 (2019).
  21. György Gergely and Gergely Csibra. 2003. Teleological reasoning in infancy: The naıve theory of rational action. Trends in cognitive sciences 7, 7 (2003), 287–292.
  22. Rachel Giora. 2004. On the graded salience hypothesis. (2004).
  23. Noah D Goodman and Michael C Frank. 2016. Pragmatic language interpretation as probabilistic inference. Trends in cognitive sciences 20, 11 (2016), 818–829.
  24. Herbert P Grice. 1975. Logic and Conversation. In Speech Acts. Brill, 41–58.
  25. Cooperative inverse reinforcement learning. In Advances in neural information processing systems. 3909–3917.
  26. New admissible heuristics for domain-independent planning. In AAAI, Vol. 5. 9–13.
  27. Milos Hauskrecht. 2000. Value-function approximations for partially observable Markov decision processes. Journal of artificial intelligence research 13 (2000), 33–94.
  28. Michael Held and Richard M Karp. 1971. The traveling-salesman problem and minimum spanning trees: Part II. Mathematical programming 1, 1 (1971), 6–25.
  29. Malte Helmert and Carmel Domshlak. 2009. Landmarks, critical paths and abstractions: what’s the difference anyway?. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 19. 162–169.
  30. Reusing cost-minimal paths for goal-directed navigation in partially known terrains. Autonomous Agents and Multi-Agent Systems 29 (2015), 850–895.
  31. Can LLMs Generate Random Numbers? Evaluating LLM Sampling in Controlled Domains. In ICML 2023 Workshop: Sampling and Optimization in Discrete Space.
  32. Julian Jara-Ettinger. 2019. Theory of mind as inverse reinforcement learning. Current Opinion in Behavioral Sciences 29 (2019), 105–110.
  33. The Naive Utility Calculus as a unified, quantitative framework for action understanding. PsyArXiv (2019).
  34. Reward-rational (implicit) choice: A unifying formalism for reward learning. Advances in Neural Information Processing Systems 33 (2020), 4415–4426.
  35. MMToM-QA: Multimodal Theory of Mind Question Answering. arXiv preprint arXiv:2401.08743 (2024).
  36. Istvan Kecskes et al. 2004. The role of salience in processing pragmatic units. Acta Linguistica Hungarica (Since 2017 Acta Linguistica Academica) 51, 3-4 (2004), 309–324.
  37. Understanding the effects of RLHF on LLM generalisation and diversity. arXiv preprint arXiv:2310.06452 (2023).
  38. Sven Koenig and Maxim Likhachev. 2006. Real-Time Adaptive A*. In Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems. 281–288.
  39. Sven Koenig and Xiaoxun Sun. 2009. Comparing real-time and incremental heuristic search for real-time situated agents. Autonomous Agents and Multi-Agent Systems 18 (2009), 313–341.
  40. Richard E Korf. 1990. Real-time heuristic search. Artificial intelligence 42, 2-3 (1990), 189–211.
  41. Reward Design with Language Models. In The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=10uNUgI5Kl
  42. SMCP3: Sequential Monte Carlo with probabilistic program proposals. In International Conference on Artificial Intelligence and Statistics. PMLR, 7061–7088.
  43. Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs. arXiv preprint arXiv:2306.03081 (2023).
  44. Inferring rewards from language in context. arXiv preprint arXiv:2204.02515 (2022).
  45. Learning policies for partially observable environments: Scaling up. In Machine Learning Proceedings 1995. Elsevier, 362–370.
  46. Lang2LTL: Translating Natural Language Commands to Temporal Specification with Large Language Models. In Workshop on Language and Robotics at CoRL 2022.
  47. POMCoP: Belief Space Planning for Sidekicks in Cooperative Games.. In AIIDE.
  48. PDDL - The Planning Domain Definition Language.
  49. Smitha Milli and Anca D Dragan. 2020. Literal or pedagogic human? analyzing human model misspecification in objective learning. In Uncertainty in artificial intelligence. PMLR, 925–934.
  50. Tell me Dave: Context-sensitive grounding of natural language to manipulation instructions. The International Journal of Robotics Research 35, 1-3 (2016), 281–300.
  51. Kevin Murphy and Stuart Russell. 2001. Rao-Blackwellised particle filtering for dynamic Bayesian networks. In Sequential Monte Carlo methods in practice. Springer, 499–515.
  52. Capir: Collaborative action planning with intention recognition. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 7. 61–66.
  53. Judea Pearl. 2012. The do-calculus revisited. In Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence. 3–11.
  54. Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8494–8502.
  55. Watch-and-help: A challenge for social perception and human-ai collaboration. arXiv preprint arXiv:2010.09890 (2020).
  56. Deepak Ramachandran and Eyal Amir. 2007. Bayesian Inverse Reinforcement Learning.. In IJCAI, Vol. 7. 2586–2591.
  57. Miguel Ramírez and Hector Geffner. 2010. Probabilistic plan recognition using off-the-shelf classical planners. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 24.
  58. Stuart Russell. 2021. Human-Compatible Artificial Intelligence. Human-like machine intelligence (2021), 3–23.
  59. Planning in stochastic environments with goal uncertainty. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 1649–1654.
  60. Minding Language Models’ (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (Eds.). Association for Computational Linguistics, Toronto, Canada, 13960–13980. https://doi.org/10.18653/v1/2023.acl-long.780
  61. Planning with uncertain specifications (puns). IEEE Robotics and Automation Letters 5, 2 (2020), 3414–3421.
  62. Constrained language models yield few-shot semantic parsers. arXiv preprint arXiv:2104.08768 (2021).
  63. Theory of minds: Understanding behavior in groups through inverse planning. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 6163–6170.
  64. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 11523–11530.
  65. Grounding English commands to reward functions. In Robotics: Science and Systems.
  66. Modeling communication to coordinate perspectives in cooperation. arXiv preprint arXiv:2106.02164 (2021).
  67. Reconciling truthfulness and relevance as epistemic and decision-theoretic utility. Psychological Review (2023).
  68. Zachary Sunberg and Mykel Kochenderfer. 2018. Online algorithms for POMDPs with continuous state, action, and observation spaces. In Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 28. 259–263.
  69. Exploring an Imagined “We” in human collective hunting: Joint commitment within shared intentionality. In Proceedings of the annual meeting of the cognitive science society, Vol. 44.
  70. Bootstrapping an Imagined We for Cooperation.. In CogSci.
  71. Understanding natural language commands for robotic navigation and mobile manipulation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 25. 1507–1514.
  72. William R Thompson. 1935. On the theory of apportionment. American Journal of Mathematics 57, 2 (1935), 450–456.
  73. Michael Tomasello and Malinda Carpenter. 2007. Shared intentionality. Developmental science 10, 1 (2007), 121–125.
  74. Reward learning from narrated demonstrations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7004–7013.
  75. HandMeThat: Human-Robot Communication in Physical and Social Environments. Advances in Neural Information Processing Systems 35 (2022), 12014–12026.
  76. Brandon T Willard and Rémi Louf. 2023. Efficient Guided Generation for LLMs. arXiv preprint arXiv:2307.09702 (2023).
  77. Learning to parse natural language to grounded reward functions with weak supervision. In 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 4430–4436.
  78. Too Many Cooks: Bayesian Inference for Coordinating Multi-Agent Collaboration. Topics in Cognitive Science 13, 2 (2021), 414–432.
  79. Frank Yates. 1948. Systematic sampling. Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences 241, 834 (1948), 345–377.
  80. The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs. arXiv preprint arXiv:2306.14325 (2023).
  81. Inferring the goals of communicating agents from actions and instructions. In Proceedings of the AAAI Symposium Series, Vol. 2. 26–33.
  82. Language to Rewards for Robotic Skill Synthesis. arXiv preprint arXiv:2306.08647 (2023).
  83. Tan Zhi-Xuan. 2022a. GenGPT3.jl: GPT-3 as a generative function in Gen.jl. https://github.com/probcomp/GenGPT3.jl
  84. Tan Zhi-Xuan. 2022b. PDDL. jl: An Extensible Interpreter and Compiler Interface for Fast and Flexible AI Planning. Ph.D. Dissertation. Massachusetts Institute of Technology.
  85. Solving the Baby Intuition Benchmark with a hierarchically Bayesian theory-of-mind. arXiv preprint arXiv:2208.02914 (2022).
  86. Online Bayesian Goal Inference for Boundedly Rational Planning Agents. Advances in Neural Information Processing Systems 33 (2020).
  87. Maximum entropy inverse reinforcement learning.. In AAAI, Vol. 8. Chicago, IL, USA, 1433–1438.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Tan Zhi-Xuan (22 papers)
  2. Lance Ying (14 papers)
  3. Vikash Mansinghka (31 papers)
  4. Joshua B. Tenenbaum (257 papers)
Citations (11)

Summary

We haven't generated a summary for this paper yet.