Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Practice Makes Perfect: Planning to Learn Skill Parameter Policies (2402.15025v2)

Published 22 Feb 2024 in cs.RO and cs.LG

Abstract: One promising approach towards effective robot decision making in complex, long-horizon tasks is to sequence together parameterized skills. We consider a setting where a robot is initially equipped with (1) a library of parameterized skills, (2) an AI planner for sequencing together the skills given a goal, and (3) a very general prior distribution for selecting skill parameters. Once deployed, the robot should rapidly and autonomously learn to improve its performance by specializing its skill parameter selection policy to the particular objects, goals, and constraints in its environment. In this work, we focus on the active learning problem of choosing which skills to practice to maximize expected future task success. We propose that the robot should estimate the competence of each skill, extrapolate the competence (asking: "how much would the competence improve through practice?"), and situate the skill in the task distribution through competence-aware planning. This approach is implemented within a fully autonomous system where the robot repeatedly plans, practices, and learns without any environment resets. Through experiments in simulation, we find that our approach learns effective parameter policies more sample-efficiently than several baselines. Experiments in the real-world demonstrate our approach's ability to handle noise from perception and control and improve the robot's ability to solve two long-horizon mobile-manipulation tasks after a few hours of autonomous practice. Project website: http://ees.csail.mit.edu

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Learning symbolic representations for planning with parameterized skills. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018. URL https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8594313.
  2. A survey of exploration methods in reinforcement learning. arXiv preprint arXiv:2109.00157, 2021. URL https://arxiv.org/pdf/2109.00157.pdf.
  3. Active exploration for learning symbolic representations. Advances in Neural Information Processing Systems (NeurIPS), 2017. URL https://dl.acm.org/doi/pdf/10.5555/3295222.3295254.
  4. Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research (JMLR), 2002. URL https://www.jmlr.org/papers/volume3/auer02a/auer02a.pdf.
  5. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems, 2013. URL http://www.pyoudeyer.com/ActiveGoalExploration-RAS-2013.pdf.
  6. Bandit problems. sequential allocation of experiments. Monographs on Statistics and Applied Probability., 1987. URL https://onlinelibrary.wiley.com/doi/epdf/10.1002/bimj.4710290105.
  7. Action priors for large action spaces in robotics. In International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2021. URL https://arxiv.org/pdf/2101.04178.pdf.
  8. Skill-based curiosity for intrinsically motivated reinforcement learning. Machine Learning, 109, 2020. URL https://link.springer.com/content/pdf/10.1007/s10994-019-05845-8.pdf.
  9. Pure exploration in multi-armed bandits problems. In Algorithmic Learning Theory: 20th International Conference (ALT). Springer, 2009. URL http://sbubeck.com/ALT09_BMS.pdf.
  10. Simple regret for infinitely many armed bandits. In International Conference on Machine Learning (ICML), 2015. URL http://proceedings.mlr.press/v37/carpentier15.pdf.
  11. Diffusion policy: Visuomotor policy learning via action diffusion. In Robotics: Science and Systems (RSS), 2023. URL https://diffusion-policy.cs.columbia.edu/diffusion_policy_2023.pdf.
  12. Guided search for task and motion plans using learned heuristics. In IEEE International Conference on Robotics and Automation (ICRA), 2016. URL https://people.eecs.berkeley.edu/~pabbeel/papers/2016-ICRA-tamp-learning.pdf.
  13. Curious: intrinsically motivated modular multi-goal reinforcement learning. In International Conference on Machine Learning (ICML), 2019. URL https://proceedings.mlr.press/v97/colas19a/colas19a.pdf.
  14. Active learning of parameterized skills. In International Conference on Machine Learning (ICML), 2014. URL https://proceedings.mlr.press/v32/silva14.html.
  15. Accelerating robotic reinforcement learning via parameterized action primitives. Advances in Neural Information Processing Systems (NeurIPS), 2021. URL https://proceedings.neurips.cc/paper/2021/file/b6846b0186a035fcc76b1b1d26fd42fa-Paper.pdf.
  16. Active reward learning. In Robotics: Science and Systems (RSS), 2014. URL https://www.roboticsproceedings.org/rss10/p31.pdf.
  17. An object-oriented representation for efficient reinforcement learning. In International Conference on Machine Learning (ICML), 2008. URL https://carlosdiuk.github.io/papers/OORL.pdf.
  18. Deep visual reasoning: Learning to predict action sequences for task and motion planning from an initial scene image. In Robotics: Science and Systems (RSS), 2020. URL https://www.roboticsproceedings.org/rss16/p003.pdf.
  19. Planning to practice: Efficient online fine-tuning by composing goals in latent space. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022. URL https://arxiv.org/pdf/2205.08129.pdf.
  20. Active task randomization: Learning robust skills via unsupervised generation of diverse and feasible tasks. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023. URL https://arxiv.org/pdf/2211.06134.pdf.
  21. Pddl2. 1: An extension to pddl for expressing temporal planning domains. Journal of Artificial Intelligence Research (JAIR), 2003. URL https://arxiv.org/pdf/1106.4561.pdf.
  22. Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation. In arXiv, 2024. URL https://mobile-aloha.github.io/resources/mobile-aloha.pdf.
  23. Online replanning in belief space for partially observable task and motion problems. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 5678–5684. IEEE, 2020. URL https://arxiv.org/pdf/1911.04577.pdf.
  24. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 2021. URL https://www.annualreviews.org/doi/pdf/10.1146/annurev-control-091420-084139.
  25. Reset-free reinforcement learning via multi-task learning: Learning dexterous manipulation behaviors without human intervention. In IEEE International Conference on Robotics and Automation (ICRA), 2021. URL https://arxiv.org/pdf/2104.11203.pdf.
  26. Bootstrapped autonomous practicing via multi-task reinforcement learning. In IEEE International Conference on Robotics and Automation (ICRA), 2023. URL https://arxiv.org/pdf/2203.15755.pdf.
  27. Deep reinforcement learning in parameterized action space. In International Conference on Learning Representations (ICLR), 2016. URL https://www.cs.utexas.edu/users/pstone/Papers/bib2html-links/ICLR16-hausknecht.pdf.
  28. Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research (JAIR), 2006. URL https://www.jair.org/index.php/jair/article/view/10457/25068.
  29. Near-optimal reinforcement learning in polynomial time. Machine learning, 2002. URL https://www.cis.upenn.edu/~mkearns/papers/KearnsSinghE3.pdf.
  30. Guiding search in continuous state-action spaces by learning an action sampler from off-target search experience. In Thirty-Second AAAI Conference on Artificial Intelligence (AAAI), 2018. URL https://ojs.aaai.org/index.php/AAAI/article/view/12106/11965.
  31. Adam: A method for stochastic optimization, 2014. URL https://arxiv.org/abs/1412.6980.
  32. Segment anything. arXiv preprint arXiv:2304.02643, 2023. URL https://arxiv.org/pdf/2304.02643.pdf.
  33. From skills to symbols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research, 2018. URL https://jair.org/index.php/jair/article/view/11175/26380.
  34. Learning efficient abstract planning models that choose what to predict. In Conference on Robot Learning (CoRL), 2023a. URL https://openreview.net/pdf?id=_gZLyRGGuo.
  35. Bilevel planning for robots: An illustrated introduction, 2023b. URL https://lis.csail.mit.edu/bilevel-planning-for-robots-an-illustrated-introduction.
  36. Embodied active learning of relational state abstractions for bilevel planning. In Conference on Lifelong Learning Agents (CoLLAs), 2023. URL https://arxiv.org/pdf/2303.04912.pdf.
  37. A contextual-bandit approach to personalized news article recommendation. In International Conference on World Wide Web, 2010. URL https://arxiv.org/pdf/1003.0146.pdf.
  38. Reset-free lifelong learning with skill-space planning. In International Conference on Learning Representations (ICLR), 2021. URL https://openreview.net/pdf?id=HIGSa_3kOx3.
  39. Eureka: Human-level reward design via coding large language models. arXiv preprint arXiv:2310.12931, 2023. URL https://arxiv.org/pdf/2310.12931.pdf.
  40. Reinforcement learning with parameterized actions. In AAAI Conference on Artificial Intelligence (AAAI), 2016. URL https://ojs.aaai.org/index.php/AAAI/article/view/10226/10085.
  41. Optimistic bayesian sampling in contextual-bandit problems. Journal of Machine Learning Research (JMLR), 2012. URL https://www.jmlr.org/papers/volume13/may12a/may12a.pdf.
  42. Embodied lifelong learning for task and motion planning. In Conference on Robot Learning (CoRL), 2023. URL https://openreview.net/pdf?id=ZFjgfJb_5c.
  43. Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning, 2023. URL https://openreview.net/pdf?id=HtJE9ly5dT.
  44. Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks. In IEEE International Conference on Robotics and Automation (ICRA), 2022. URL https://arxiv.org/pdf/2110.03655.pdf.
  45. Active learning of abstract plan feasibility. In Robotics: Science and Systems (RSS), 2021. URL https://www.roboticsproceedings.org/rss17/p043.pdf.
  46. Self-supervised exploration via disagreement. In International Conference on Machine Learning (ICML), 2019. URL http://proceedings.mlr.press/v97/pathak19a/pathak19a.pdf.
  47. Accelerating reinforcement learning with learned skill priors. In Conference on Robot Learning (CoRL), 2021. URL https://arxiv.org/pdf/2010.11944.pdf.
  48. Intrinsically motivated open-ended learning in autonomous robots, 2020. URL https://www.frontiersin.org/articles/10.3389/fnbot.2019.00115/full.
  49. Burr Settles. From theories to queries: Active learning in practice. In Active learning and experimental design workshop in conjunction with AISTATS 2010, 2011. URL https://proceedings.mlr.press/v16/settles11a.html.
  50. Anytime integrated task and motion policies for stochastic environments. In 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 9285–9291. IEEE, 2020. URL https://aair-lab.github.io/Projects/STAMP/skkks_icra2020_full.pdf.
  51. Learning symbolic operators for task and motion planning. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021. URL https://arxiv.org/pdf/2103.00589.pdf.
  52. Learning neuro-symbolic skills for bilevel planning. In 6th Annual Conference on Robot Learning, 2022. URL https://openreview.net/forum?id=OIaJRUo5UXy.
  53. Predicate invention for bilevel planning. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023. URL https://ojs.aaai.org/index.php/AAAI/article/view/26429/26201.
  54. Parrot: Data-driven behavioral priors for reinforcement learning. In International Conference on Learning Representations (ICLR), 2021. URL https://openreview.net/forum?id=Ysuv-WOFeKR.
  55. Combined task and motion planning through an extensible planner-independent interface layer. In IEEE international conference on robotics and automation (ICRA), 2014. URL https://people.eecs.berkeley.edu/~russell/papers/icra14-planrob.pdf.
  56. Competence progress intrinsic motivation. In IEEE 9th International Conference on Development and Learning (ICDL), 2010. URL https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=1e9521d28184a344c077edaf780c1205b3e90139.
  57. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 1999. URL https://www.sciencedirect.com/science/article/pii/S0004370299000521/pdf?md5=780c0bdb220bb0fa2d0721720296922c&pid=1-s2.0-S0004370299000521-main.pdf.
  58. Sebastian Thrun. A lifelong learning perspective for mobile robot control. In IEEE International Conference on Intelligent Robots and Systems (IROS), 1995. URL https://www.ri.cmu.edu/pub_files/pub1/thrun_sebastian_1995_3/thrun_sebastian_1995_3.pdf.
  59. Lotus: Continual imitation learning for robot manipulation through unsupervised skill discovery. arXiv preprint arXiv:2311.02058, 2023. URL https://arxiv.org/pdf/2311.02058.pdf.
  60. Learning compositional models of robot skills for task and motion planning. The International Journal of Robotics Research IJRR), 2021. URL https://arxiv.org/pdf/2006.06444.pdf.
  61. Learning feasibility for task and motion planning in tabletop environments. IEEE Robotics and Automation Letters (RAL), 2019. URL https://europepmc.org/backend/ptpmcrender.fcgi?accid=PMC6491048&blobtype=pdf.
  62. Adaptive mobile manipulation for articulated objects in the open world. arXiv preprint arXiv:2401.14403, 2024. URL https://open-world-mobilemanip.github.io/paper.pdf.
  63. Accelerating integrated task and motion planning with neural feasibility checking. arXiv preprint arXiv:2203.10568, 2022. URL https://arxiv.org/pdf/2203.10568.pdf.
  64. Compositional Diffusion-Based Continuous Constraint Solvers. In Conference on Robot Learning (CoRL), 2023. URL https://arxiv.org/pdf/2309.00966.pdf.
  65. Learning fine-grained bimanual manipulation with low-cost hardware. In Robotics: Science and Systems (RSS), 2023. URL https://arxiv.org/pdf/2304.13705.pdf.
  66. Detecting twenty-thousand classes using image-level supervision. In European Conference on Computer Vision, pages 350–368. Springer, 2022. URL https://github.com/facebookresearch/Detic.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Nishanth Kumar (23 papers)
  2. Tom Silver (31 papers)
  3. Willie McClinton (6 papers)
  4. Linfeng Zhao (17 papers)
  5. Stephen Proulx (2 papers)
  6. Tomás Lozano-Pérez (85 papers)
  7. Leslie Pack Kaelbling (94 papers)
  8. Jennifer Barry (2 papers)
Citations (11)
X Twitter Logo Streamline Icon: https://streamlinehq.com