Papers
Topics
Authors
Recent
Search
2000 character limit reached

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

Published 7 May 2024 in cs.RO | (2405.04082v3)

Abstract: Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given solely in terms of the final geometric configuration rather than a symbolic goal. To address this challenge, we propose Logic-Skill Programming (LSP), an optimization-based approach that sequences independently learned skills to solve long-horizon tasks. We formulate a first-order extension of a mathematical program to optimize the overall cumulative reward of all skills within a plan, abstracted by the sum of value functions. To solve such programs, we leverage the use of tensor train factorization to construct the value function space, and rely on alternations between symbolic search and skill value optimization to find the appropriate skill skeleton and optimal subgoal sequence. Experimental results indicate that the obtained value functions provide a superior approximation of cumulative rewards compared to state-of-the-art reinforcement learning methods. Furthermore, we validate LSP in three manipulation domains, encompassing both prehensile and non-prehensile primitives. The results demonstrate its capability to identify the optimal solution over the full logic and geometric path. The real-robot experiments showcase the effectiveness of our approach to cope with contact uncertainty and external disturbances in the real world.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Pddl— the planning domain definition language. Technical Report, Tech. Rep., 1998.
  2. Stap: Sequencing task-agnostic policies. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 7951–7958. IEEE, 2023.
  3. AIRobot. https://github.com/Improbable-AI/airobot, 2019.
  4. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.
  5. Pybullet, a python module for physics simulation for games, robotics and machine learning. https://pybullet.org, 2016–2019.
  6. A tutorial on the cross-entropy method. Annals of operations research, 134:19–67, 2005.
  7. Implicit behavioral cloning. In Conference on Robot Learning, pages 158–168. PMLR, 2022.
  8. Pddlstream: Integrating symbolic planners and blackbox samplers via optimistic adaptive planning. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 30, pages 440–448, 2020.
  9. Integrated task and motion planning. Annual review of control, robotics, and autonomous systems, 4:265–293, 2021.
  10. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pages 1861–1870. PMLR, 2018.
  11. Long-horizon multi-robot rearrangement planning for construction assembly. IEEE Transactions on Robotics, 39(1):239–252, 2022.
  12. Reactive planar non-prehensile manipulation with hybrid model predictive control. International Journal of Robotics Research (IJRR), 39(7):755–773, 2020.
  13. Continuous relaxation of symbolic planner for one-shot imitation learning. In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2635–2642. IEEE, 2019.
  14. Mixed-integer formulations for optimal control of piecewise-affine systems. In Proceedings of the 22nd ACM International Conference on Hybrid Systems: Computation and Control, pages 230–239, 2019.
  15. Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics, 7(62):eabk2822, 2022.
  16. Generative skill chaining: Long-horizon skill planning with diffusion models. In Conference on Robot Learning, pages 2905–2925. PMLR, 2023.
  17. Non-prehensile planar manipulation via trajectory optimization with complementarity constraints. In 2022 International Conference on Robotics and Automation (ICRA), pages 970–976. IEEE, 2022.
  18. TT-cross approximation for multidimensional arrays. Linear Algebra and its Applications, 432(1):70–88, 2010.
  19. Ivan V Oseledets. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5):2295–2317, 2011.
  20. Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models. IEEE Transactions on Robotics, 2023.
  21. A direct method for trajectory optimization of rigid bodies through contact. International Journal of Robotics Research (IJRR), 33(1):69–81, 2014.
  22. Optimal Grasps and Placements for Task and Motion Planning in Clutter. In Proc. IEEE Intl Conf. on Robotics and Automation (ICRA), pages 3707–3713, 2023.
  23. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 22(268):1–8, 2021. URL http://jmlr.org/papers/v22/20-1364.html.
  24. Reuven Rubinstein. The cross-entropy method for combinatorial and continuous optimization. Methodology and computing in applied probability, 1:127–190, 1999.
  25. Fast adaptive interpolation of multi-dimensional arrays in tensor train format. The 2011 International Workshop on Multidimensional (nD) Systems, pages 1–8, 2011.
  26. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  27. Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning. In Proc. Intl Conf. on Learning Representations (ICLR), 2021.
  28. Tensor Trains for Global Optimization Problems in Robotics. International Journal of Robotics Research (IJRR), 2023.
  29. Generalized Policy Iteration using Tensor Approximation for Hybrid Control. In Proc. Intl Conf. on Learning Representations (ICLR), 2024.
  30. Combined task and motion planning through an extensible planner-independent interface layer. In 2014 IEEE international conference on robotics and automation (ICRA), pages 639–646. IEEE, 2014.
  31. Reinforcement learning: An introduction. MIT press, 2018.
  32. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1-2):181–211, 1999.
  33. Marc Toussaint. Logic-geometric programming: an optimization-based approach to combined task and motion planning. In Proceedings of the 24th International Conference on Artificial Intelligence, pages 1930–1936, 2015.
  34. Differentiable Physics and Stable Modes for Tool-Use and Manipulation Planning. In Proc. of Robotics: Science and Systems (R:SS), 2018. Best Paper Award.
  35. Sequence-of-Constraints MPC: Reactive Timing-Optimal Control of Sequential Manipulation. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 13753–13760. IEEE, 2022.
  36. Brian C Williams. Cognitive Robotics: Monte Carlo Tree Search. Cambridge MA, 2016. MIT OpenCourseWare.
  37. Example-Driven Model-Based Reinforcement Learning for Solving Long-Horizon Visuomotor Tasks. In 5th Annual Conference on Robot Learning, 2021.
  38. Deep affordance foresight: Planning through what can be done in the future. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pages 6206–6213. IEEE, 2021.
  39. Demonstration-guided optimal control for long-term non-prehensile planar manipulation. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 4999–5005. IEEE, 2023a.
  40. D-LGP: Dynamic Logic-Geometric Program for Combined Task and Motion Planning. arXiv preprint arXiv:2312.02731, 2023b.
  41. Sequence-based plan feasibility prediction for efficient task and motion planning. In Proc. Robotics: Science and Systems (R:SS), Daegu, Republic of Korea, July 2023.
Citations (4)

Summary

  • The paper introduces LSP, a novel approach that combines symbolic search with skill value optimization to sequence robot manipulation skills.
  • It employs Monte Carlo Tree Search and Tensor Train-based Approximate Dynamic Programming to efficiently optimize pathways to geometric goals.
  • Extensive experiments in non-prehensile, partly-prehensile, and prehensile domains demonstrate LSP’s superior cumulative rewards and robustness compared to baselines.

Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

This paper addresses the complexity associated with sequencing learned robot skills to execute lengthy and intricate manipulation tasks, focusing on optimizing these sequences based solely on the final geometric configuration. The proposed methodology offers a structured approach without relying on predefined symbolic goals, leveraging Logic-Skill Programming (LSP) in optimization-based sequential skill planning.

Sequential Skill Planning Problem

In robotics, creating a sequence of skills (such as pushing, pulling, pivoting) to accomplish complex tasks remains a challenging problem. Traditional methods often rely on generating plans that meet symbolic goals, which may not suit scenarios where tasks are defined by reaching a specific geometric state. The paper introduces LSP as an innovative method that doesn't require symbolic goal specification, offering a solution to determining optimal sequences by integrating planning with optimization principles. Figure 1

Figure 1: Overview of the proposed approach: Given the evaluation function Psi of the final configuration, along with the initial symbolic state s_0 and geometric state $\overline{\bm{x}_0$, the objective of LSP is to find a solution that can accomplish the task with minimal control costs.

Methodology: Logic-Skill Programming

LSP involves an alternating structure between symbolic logic-based searches and skill value function optimizations:

This phase uses Planning Domain Definition Language (PDDL) to represent task domains, employing Monte Carlo Tree Search (MCTS) to traverse potential skill sequences. MCTS accommodates symbolic transitions based on skill preconditions/effects and optimizes through Upper Confidence Bound selection. Crucially, this approach allows a diverse exploration of sequence lengths and configurations without needing explicit target goals.

Skill Value Optimization

The optimization level queries whether sequences fulfill path/switch constraints and reduces to optimizing cumulative rewards. To ensure precision, the paper utilizes Tensor Train-based Approximate Dynamic Programming, approximating value functions across entire state spaces effectively. By leveraging Tensor Train (TT) decomposition, this stage resolves mixed-integer programming complexities via Cross-Entropy Method (CEM), yielding optimal subgoal sequences contributing to achievements of tasks focused by LSP.

Experimental Validation

The robustness of LSP is validated through extensive tests in simulation domains encompassing:

  • Non-Prehensile Manipulation (NPM): Involves manipulation of non-graspable objects via contacts and planar primitives.
  • Partly-Prehensile Manipulation (PPM): Encompasses mixed strategy for objects graspable in limited orientations.
  • Prehensile Manipulation (PM): Objects requiring direct grasp strategies, tackling beyond-reach tasks.

The experiments exhibit how LSP efficiently plans skill trajectories, optimizing paths for overall reward peaks while achieving final configuration objectives across diverse manipulation scenarios. Figure 2

Figure 2

Figure 2

Figure 2: Non-Prehensile Manipulation domain.

Comparison with Baselines

Comparative analyses against frameworks like STAP highlight that, although faster, STAP depends on symbolic goal descriptions to work. LSP, contrastingly, optimizes across logic-geometric paths, yielding solutions with higher cumulative rewards. This characteristic is notably advantageous in scenarios defined by environmental uncertainties and complex dynamics. Figure 3

Figure 3

Figure 3

Figure 3: Non-Prehensile Manipulation.

Real-World Applications

Real-world robot experiments confirm the reliability of LSP in handling contact-rich tasks under real-time constraints using a pre-trained skill library. Through reactive manipulations, the robot smoothly accomplished task sequences amidst disturbances, evidencing LSP's practical robustness.

Conclusion

Logic-Skill Programming (LSP) represents a significant advancement in sequential skill planning, substituting symbolic goal dependencies with direct reward optimizations. Its unique alternating framework between search and optimization provides versatility and optimization depth, proving crucial for future developments in adaptive problem-solving within AI-driven robotics. Future directions may include scaling Tensor Train elements or integrating planning interfaces with symbolic learning for enhanced real-world adaptation.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.