Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Toward Grounded Commonsense Reasoning (2306.08651v2)

Published 14 Jun 2023 in cs.RO and cs.AI

Abstract: Consider a robot tasked with tidying a desk with a meticulously constructed Lego sports car. A human may recognize that it is not appropriate to disassemble the sports car and put it away as part of the "tidying." How can a robot reach that conclusion? Although LLMs have recently been used to enable commonsense reasoning, grounding this reasoning in the real world has been challenging. To reason in the real world, robots must go beyond passively querying LLMs and actively gather information from the environment that is required to make the right decision. For instance, after detecting that there is an occluded car, the robot may need to actively perceive the car to know whether it is an advanced model car made out of Legos or a toy car built by a toddler. We propose an approach that leverages an LLM and vision LLM (VLM) to help a robot actively perceive its environment to perform grounded commonsense reasoning. To evaluate our framework at scale, we release the MessySurfaces dataset which contains images of 70 real-world surfaces that need to be cleaned. We additionally illustrate our approach with a robot on 2 carefully designed surfaces. We find an average 12.9% improvement on the MessySurfaces benchmark and an average 15% improvement on the robot experiments over baselines that do not use active perception. The dataset, code, and videos of our approach can be found at https://minaek.github.io/grounded_commonsense_reasoning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
  2. Better-than-demonstrator imitation learning via automatically-ranked demonstrations, Oct. 2019.
  3. Learning Reward Functions by Integrating Human Demonstrations and Preferences, June 2019.
  4. Active preference-based learning of reward functions. In Proceedings of Robotics: Science and Systems (RSS), July 2017. doi:10.15607/RSS.2017.XIII.053.
  5. Learning human objectives from sequences of physical corrections. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 2877–2883. IEEE, 2021.
  6. Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences, 2021.
  7. INQUIRE: INteractive Querying for User-aware Informative REasoning.
  8. D. J. Hejna III and D. Sadigh. Few-shot preference learning for human-in-the-loop rl. In Conference on Robot Learning, pages 2014–2025. PMLR, 2023.
  9. Concrete problems in AI safety. arXiv preprint arXiv:1606.06565, 2016.
  10. Inverse reward design. Advances in neural information processing systems, 30, 2017.
  11. CommonsenseQA 2.0: Exposing the limits of AI through gamification. arXiv preprint arXiv:2201.05320, 2022.
  12. Can Machines Learn Morality? The Delphi Experiment, July 2022.
  13. Aligning ai with shared human values. arXiv preprint arXiv:2008.02275, 2020.
  14. Reward design with language models. arXiv preprint arXiv:2303.00001, 2023.
  15. Language Models are Few-Shot Learners, 2020.
  16. C. M. Rytting and D. Wingate. Leveraging the Inductive Bias of Large Language Models for Abstract Textual Reasoning.
  17. B. Zhang and H. Soh. Large Language Models as Zero-Shot Human Models for Human-Robot Interaction, Mar. 2023.
  18. Evaluating Commonsense in Pre-Trained Language Models. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05):9733–9740, Apr. 2020. ISSN 2374-3468. doi:10.1609/aaai.v34i05.6523.
  19. When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment. Advances in Neural Information Processing Systems, 35:28458–28473, Dec. 2022.
  20. Does Moral Code Have a Moral Code? Probing Delphi’s Moral Philosophy, May 2022.
  21. Aligning to Social Norms and Values in Interactive Narratives, May 2022.
  22. What Would Jiminy Cricket Do? Towards Agents That Behave Morally, Feb. 2022.
  23. Aligning AI With Shared Human Values, Feb. 2023.
  24. From Recognition to Cognition: Visual Commonsense Reasoning, Mar. 2019.
  25. April: Active preference learning-based reinforcement learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 116–131. Springer, 2012.
  26. Human preferences for robot-human hand-over configurations. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 1986–1993, 2011.
  27. Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438, 2008.
  28. Maximum margin planning, 2006.
  29. MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge, June 2022.
  30. S. Singh and J. H. Liao. Concept2Robot 2.0: Improving Learning of Manipulation Concepts Using Enhanced Representations.
  31. Concept2Robot: Learning manipulation concepts from instructions and human demonstrations. The International Journal of Robotics Research, 40(12-14):1419–1434, Dec. 2021. ISSN 0278-3649. doi:10.1177/02783649211046285.
  32. Ella: Exploration through learned language abstraction, Oct. 2021.
  33. Fine-Tuning Language Models from Human Preferences, Jan. 2020.
  34. H. Hu and D. Sadigh. Language instructed reinforcement learning for human-ai coordination. In 40th International Conference on Machine Learning (ICML), 2023.
  35. Interactive Perception: Leveraging Action in Perception and Perception in Action. IEEE Transactions on Robotics, 33(6):1273–1291, Dec. 2017. ISSN 1552-3098, 1941-0468. doi:10.1109/TRO.2017.2721939.
  36. TidyBot: Personalized Robot Assistance with Large Language Models, May 2023.
  37. Inner Monologue: Embodied Reasoning through Planning with Language Models, July 2022.
  38. Chat with the Environment: Interactive Multimodal Perception using Large Language Models, Mar. 2023.
  39. Scaling autoregressive models for content-rich text-to-image generation. arXiv, 2022.
  40. Visual Language Maps for Robot Navigation, Mar. 2023.
  41. LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action, July 2022.
  42. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances, Aug. 2022.
  43. See, Plan, Predict: Language-guided Cognitive Planning with Video Prediction, Oct. 2022.
  44. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents, Mar. 2022.
  45. Code as Policies: Language Model Programs for Embodied Control, May 2023.
  46. ViperGPT: Visual Inference via Python Execution for Reasoning, Mar. 2023.
  47. RT-1: ROBOTICS TRANSFORMER FOR REAL-WORLD CONTROL AT SCALE, Dec. 2022.
  48. PaLM-E: An Embodied Multimodal Language Model, Mar. 2023.
  49. A Generalist Agent. Transactions on Machine Learning Research, Nov. 2022.
  50. VIMA: General Robot Manipulation with Multimodal Prompts, Oct. 2022.
  51. Chain of thought prompting elicits reasoning in large language models. arXiv, 2022.
  52. Instructblip: Towards general-purpose vision-language models with instruction tuning. arXiv preprint arXiv:2305.06500, 2023.
  53. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  54. LERF: Language Embedded Radiance Fields, Mar. 2023.
  55. M. A. Research. Polymetis: A real-time pytorch controller manager. https://github.com/facebookresearch/fairo/tree/main/polymetis, 2021–2023.
  56. The pinocchio c++ library – a fast and flexible implementation of rigid body dynamics algorithms and their analytical derivatives. In IEEE International Symposium on System Integrations (SII), 2019.
  57. Pinocchio: fast forward and inverse dynamics for poly-articulated systems. https://stack-of-tasks.github.io/pinocchio, 2015–2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Minae Kwon (10 papers)
  2. Hengyuan Hu (22 papers)
  3. Vivek Myers (16 papers)
  4. Siddharth Karamcheti (26 papers)
  5. Anca Dragan (62 papers)
  6. Dorsa Sadigh (162 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com