Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 30 tok/s Pro
GPT-4o 86 tok/s Pro
Kimi K2 173 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Learning to Look: Seeking Information for Decision Making via Policy Factorization (2410.18964v1)

Published 24 Oct 2024 in cs.RO and cs.LG

Abstract: Many robot manipulation tasks require active or interactive exploration behavior in order to be performed successfully. Such tasks are ubiquitous in embodied domains, where agents must actively search for the information necessary for each stage of a task, e.g., moving the head of the robot to find information relevant to manipulation, or in multi-robot domains, where one scout robot may search for the information that another robot needs to make informed decisions. We identify these tasks with a new type of problem, factorized Contextual Markov Decision Processes, and propose DISaM, a dual-policy solution composed of an information-seeking policy that explores the environment to find the relevant contextual information and an information-receiving policy that exploits the context to achieve the manipulation goal. This factorization allows us to train both policies separately, using the information-receiving one to provide reward to train the information-seeking policy. At test time, the dual agent balances exploration and exploitation based on the uncertainty the manipulation policy has on what the next best action is. We demonstrate the capabilities of our dual policy solution in five manipulation tasks that require information-seeking behaviors, both in simulation and in the real-world, where DISaM significantly outperforms existing methods. More information at https://robin-lab.cs.utexas.edu/learning2look/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (82)
  1. R. Bajcsy. Active perception. Proceedings of the IEEE, 76(8):966–1005, 1988.
  2. Active vision. International journal of computer vision, 1:333–356, 1988.
  3. Revisiting active perception. Autonomous Robots, 42:177–196, 2018.
  4. R. Pito. A solution to the next best view problem for automated surface acquisition. IEEE Transactions on pattern analysis and machine intelligence, 21(10):1016–1030, 1999.
  5. View planning for automated three-dimensional object reconstruction and inspection. ACM Computing Surveys (CSUR), 35(1):64–96, 2003.
  6. A comparison of volumetric information gain metrics for active 3d object reconstruction. Autonomous Robots, 42(2):197–208, 2018.
  7. J. M. Tenenbaum. Accommodation in computer vision. Stanford University, 1971.
  8. T. D. Garvey. Perceptual strategies for purposive vision. 1976.
  9. K. Pahlavan and J.-O. Eklundh. A head-eye system—analysis and design. CVGIP: Image Understanding, 56(1):41–56, 1992.
  10. Interactive perception: Leveraging action in perception and perception in action. IEEE Transactions on Robotics, 33(6):1273–1291, 2017.
  11. D. Katz and O. Brock. Manipulating articulated objects with interactive perception. In 2008 IEEE International Conference on Robotics and Automation, pages 272–277. IEEE, 2008.
  12. R. Martín-Martín and O. Brock. Coupled recursive estimation for online interactive perception of articulated objects. The International Journal of Robotics Research, 41(8):741–777, 2022.
  13. Interactive object recognition using proprioceptive and auditory feedback. The International Journal of Robotics Research, 30(10):1250–1262, 2011.
  14. G. Metta and P. Fitzpatrick. Early integration of vision and manipulation. Adaptive behavior, 11(2):109–128, 2003.
  15. Combining active learning and reactive control for robot grasping. Robotics and Autonomous systems, 58(9):1105–1116, 2010.
  16. Act the part: Learning interaction strategies for articulated object part discovery. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15752–15761, 2021.
  17. Roboexp: Action-conditioned scene graph via interactive exploration for robotic manipulation. arXiv preprint arXiv:2402.15487, 2024.
  18. Planning and acting in partially observable stochastic domains. Artificial intelligence, 101(1-2):99–134, 1998.
  19. Belief space planning assuming maximum likelihood observations. In Robotics: Science and Systems, volume 2, 2010.
  20. L. P. Kaelbling and T. Lozano-Pérez. Integrated task and motion planning in belief space. The International Journal of Robotics Research, 32(9-10):1194–1227, 2013.
  21. G. E. Monahan. State of the art—a survey of partially observable markov decision processes: theory, models, and algorithms. Management science, 28(1):1–16, 1982.
  22. Learning to navigate in complex environments. arXiv preprint arXiv:1611.03673, 2016.
  23. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937. PMLR, 2016.
  24. Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3674–3683, 2018.
  25. Contextual markov decision processes. arXiv preprint arXiv:1502.02259, 2015.
  26. Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
  27. Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
  28. Open x-embodiment: Robotic learning datasets and rt-x models. arXiv preprint arXiv:2310.08864, 2023.
  29. Octo: An open-source generalist robot policy. arXiv preprint arXiv:2405.12213, 2024.
  30. H. Kurniawati. Partially observable markov decision processes and robotics. Annual Review of Control, Robotics, and Autonomous Systems, 5:253–277, 2022.
  31. Partially observable markov decision processes in robotics: A survey. IEEE Transactions on Robotics, 39(1):21–40, 2022.
  32. The optimal control of partially observable markov processes over a finite horizon. Operations research, 21(5):1071–1088, 1973.
  33. M. L. Littman. Memoryless policies: Theoretical limitations and practical results. 1994.
  34. Reinforcement learning algorithm for partially observable markov decision problems. Advances in neural information processing systems, 7, 1994.
  35. Anytime point-based approximations for large pomdps. Journal of Artificial Intelligence Research, 27:335–380, 2006.
  36. Pegasus: A policy search method for large mdps and pomdps. arXiv preprint arXiv:1301.3878, 2013.
  37. Multi-resolution pomdp planning for multi-object search in 3d. In 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2022–2029. IEEE, 2021.
  38. Language-conditioned observation models for visual object search. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 10894–10901. IEEE, 2023.
  39. L. P. Kaelbling and T. Lozano-Pérez. Hierarchical task and motion planning in the now. In 2011 IEEE International Conference on Robotics and Automation, pages 1470–1477. IEEE, 2011.
  40. Online replanning in belief space for partially observable task and motion problems. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 5678–5684. IEEE, 2020.
  41. Rl for latent mdps: Regret guarantees and a lower bound. Advances in Neural Information Processing Systems, 34:24523–24534, 2021.
  42. M. E. Taylor and P. Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10(7), 2009.
  43. E. Brunskill and L. Li. Sample complexity of multi-task reinforcement learning. arXiv preprint arXiv:1309.6821, 2013.
  44. Multi-task reinforcement learning with context-based representations. In International Conference on Machine Learning, pages 9767–9779. PMLR, 2021.
  45. Model-based rl in contextual decision processes: Pac bounds and exponential improvements over model-free approaches. In Conference on learning theory, pages 2898–2933. PMLR, 2019.
  46. Block contextual mdps for continual learning. In Learning for Dynamics and Control Conference, pages 608–623. PMLR, 2022.
  47. Inverse reinforcement learning in contextual mdps. Machine Learning, 110(9):2295–2334, 2021.
  48. Multi-model markov decision processes. IISE Transactions, 53(10):1124–1139, 2021.
  49. Asking before action: Gather information in embodied decision making with language models. arXiv preprint arXiv:2305.15695, 2023.
  50. Towards dual mpc. IFAC Proceedings Volumes, 45(17):502–507, 2012.
  51. An mpc approach to dual control. IFAC Proceedings Volumes, 46(32):69–74, 2013.
  52. Closed-loop next-best-view planning for target-driven grasping. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1411–1416. IEEE, 2022.
  53. Multi-view picking: Next-best-view reaching for improved grasping in clutter. In 2019 International Conference on Robotics and Automation (ICRA), pages 8762–8768. IEEE, 2019.
  54. Active-perceptive motion generation for mobile manipulation. arXiv preprint arXiv:2310.00433, 2023.
  55. An integrated approach to visual perception of articulated objects. In 2016 IEEE international conference on robotics and automation (ICRA), pages 5091–5097. IEEE, 2016.
  56. Vision-based detection for learning articulation models of cabinet doors and drawers in household environments. In 2010 IEEE International Conference on Robotics and Automation, pages 362–368. IEEE, 2010.
  57. Reactive grasping using optical proximity sensors. In 2009 IEEE International Conference on Robotics and Automation, pages 2098–2105. IEEE, 2009.
  58. Pre-and post-contact policy decomposition for planar contact manipulation under uncertainty. The International Journal of Robotics Research, 35(1-3):244–264, 2016.
  59. Active perception and reinforcement learning. In Machine Learning Proceedings 1990, pages 179–188. Elsevier, 1990.
  60. Learning to perceive and act by trial and error. Machine Learning, 7:45–83, 1991.
  61. Deep reinforcement learning for robotics: A survey of real-world successes. arXiv preprint arXiv:2408.03539, 2024.
  62. Decoupling exploration and exploitation for meta-reinforcement learning without sacrifices. In International conference on machine learning, pages 6925–6935. PMLR, 2021.
  63. Learning to explore in pomdps with informational rewards. In Forty-first International Conference on Machine Learning.
  64. S. James and A. J. Davison. Q-attention: Enabling efficient learning for vision-based robotic manipulation. IEEE Robotics and Automation Letters, 7(2):1612–1619, 2022.
  65. Coarse-to-fine q-attention: Efficient learning for visual robotic manipulation via discretisation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13739–13748, 2022.
  66. Scone: A food scooping robot learning framework with active perception. In Conference on Robot Learning, pages 849–865. PMLR, 2023.
  67. Reinforcement learning of active vision for manipulating objects under occlusions. In Conference on Robot Learning, pages 422–431. PMLR, 2018.
  68. Active perception and representation for robotic manipulation. arXiv preprint arXiv:2003.06734, 2020.
  69. Look closer: Bridging egocentric and third-person views with transformers for robotic manipulation. IEEE Robotics and Automation Letters, 7(2):3046–3053, 2022.
  70. Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering. In Robotics science and systems, 2023.
  71. J. Shang and M. S. Ryoo. Active vision reinforcement learning under limited visual observability. Advances in Neural Information Processing Systems, 36, 2024.
  72. R. Göransson. Deep Reinforcement Learning with Active Vision on Atari Environments. PhD thesis, Master’s Thesis. Lund University, 2022.
  73. Learning to look by self-prediction. Transactions on Machine Learning Research, 2022.
  74. Hypernetworks for zero-shot transfer in reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 9579–9587, 2023.
  75. Ensembledagger: A bayesian approach to safe imitation learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5041–5048. IEEE, 2019.
  76. Libero: Benchmarking knowledge transfer for lifelong robot learning. Advances in Neural Information Processing Systems, 36, 2024.
  77. Telemoma: A modular and versatile teleoperation system for mobile manipulation. arXiv preprint arXiv:2403.07869, 2024.
  78. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  79. Learning transferable visual models from natural language supervision. CoRR, abs/2103.00020, 2021. URL https://arxiv.org/abs/2103.00020.
  80. C. M. Bishop. Mixture density networks. 1994.
  81. The ingredients of real-world robotic reinforcement learning. arXiv preprint arXiv:2004.12570, 2020.
  82. Tiago: the modular robot that adapts to different research needs. In International workshop on robot modularity, IROS, volume 290, 2016.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.