Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bi-KVIL: Keypoints-based Visual Imitation Learning of Bimanual Manipulation Tasks (2403.03270v2)

Published 5 Mar 2024 in cs.RO

Abstract: Visual imitation learning has achieved impressive progress in learning unimanual manipulation tasks from a small set of visual observations, thanks to the latest advances in computer vision. However, learning bimanual coordination strategies and complex object relations from bimanual visual demonstrations, as well as generalizing them to categorical objects in novel cluttered scenes remain unsolved challenges. In this paper, we extend our previous work on keypoints-based visual imitation learning (\mbox{K-VIL})~\cite{gao_kvil_2023} to bimanual manipulation tasks. The proposed Bi-KVIL jointly extracts so-called \emph{Hybrid Master-Slave Relationships} (HMSR) among objects and hands, bimanual coordination strategies, and sub-symbolic task representations. Our bimanual task representation is object-centric, embodiment-independent, and viewpoint-invariant, thus generalizing well to categorical objects in novel scenes. We evaluate our approach in various real-world applications, showcasing its ability to learn fine-grained bimanual manipulation tasks from a small number of human demonstration videos. Videos and source code are available at https://sites.google.com/view/bi-kvil.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. J. Gao, Z. Tao, N. Jaquier, and T. Asfour, “K-VIL: Keypoints-Based Visual Imitation Learning,” IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3888–3908, 2023.
  2. M. Muhlig, M. Gienger, J. J. Steil, and C. Goerick, “Automatic selection of task spaces for imitation learning,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2009, pp. 4996–5002.
  3. M. Muhlig, M. Gienger, S. Hellbach, J. J. Steil, and C. Goerick, “Task-level imitation learning using variance-based movement optimization,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2009, pp. 1177–1184.
  4. Y. Guiard, “Asymmetric Division of Labor in Human Skilled Bimanual Action: The Kinematic Chain as a Model,” Journal of motor behavior, vol. 19, no. 4, pp. 486–517, 1987.
  5. M. Kimmerle, C. L. Ferre, K. A. Kotwica, and G. F. Michel, “Development of role-differentiated bimanual manipulation during the infant’s first year,” Developmental Psychobiology: The Journal of the International Society for Developmental Psychobiology, vol. 52, no. 2, pp. 168–180, 2010.
  6. Y. Zhou, M. Do, and T. Asfour, “Coordinate change dynamic movement primitives - a leader-follower approach,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, p. 5481–5488.
  7. J. Liu, Y. Chen, Z. Dong, S. Wang, S. Calinon, M. Li, and F. Chen, “Robot Cooking with Stir-fry: Bimanual Non-prehensile Manipulation of Semi-fluid Objects,” IEEE Robotics and Automation Letters, vol. 7, pp. 5159–5166, 2022.
  8. L. P. Ureche and A. Billard, “Constraints extraction from asymmetrical bimanual tasks and their use in coordinated behavior,” Robotics and Autonomous Systems, vol. 103, pp. 222–235, 2018.
  9. F. Krebs and T. Asfour, “A Bimanual Manipulation Taxonomy,” IEEE Robotics and Automation Letters, vol. 7, pp. 11 031–11 038, 2022.
  10. Y. Qin, Y.-H. Wu, S. Liu, H. Jiang, R. Yang, Y. Fu, and X. Wang, “DexMV: Imitation Learning for Dexterous Manipulation from Human Videos,” in Euro. Conf. on Computer Vision (ECCV), 2022, pp. 570–587.
  11. A. Patel, A. Wang, I. Radosavovic, and J. Malik, “Learning to Imitate Object Interactions from Internet Videos,” arXiv:2211.13225, 2022.
  12. P. Sundaresan, J. Grannen, B. Thananjeyan, A. Balakrishna, M. Laskey, K. Stone, J. E. Gonzalez, and K. Goldberg, “Learning Rope Manipulation Policies Using Dense Object Descriptors Trained on Synthetic Depth Data,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2020, pp. 9411–9418.
  13. P. Florence, L. Manuelli, and R. Tedrake, “Dense Object Nets: Learning dense visual object descriptors by and for robotic manipulation,” in Conference on Robot Learning (CoRL), 2018, pp. 373–385.
  14. U. Deekshith, N. Gajjar, M. Schwarz, and S. Behnke, “Visual Descriptor Learning from Monocular Video:,” in Proceedings of the 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Valletta, Malta, 2020, pp. 444–451.
  15. P. Florence, L. Manuelli, and R. Tedrake, “Self-supervised correspondence in visuomotor policy learning,” IEEE Robotics and Automation Letters, vol. 5, pp. 492–499, 2020.
  16. D. Hadjivelichkov, S. Zwane, L. Agapito, M. P. Deisenroth, and D. Kanoulas, “One-Shot Transfer of Affordance Regions? AffCorrs!” in Conference on Robot Learning (CoRL), 2022, pp. 550–560.
  17. S. Amir, Y. Gandelsman, S. Bagon, and T. Dekel, “Deep ViT Features as Dense Visual Descriptors,” in ECCVW What is Motion For?, 2022.
  18. Y. Liu, Z. Shen, Z. Lin, S. Peng, H. Bao, and X. Zhou, “GIFT: Learning Transformation-Invariant Dense Visual Descriptors via Group CNNs,” in Neural Information Processing Systems (NeurIPS), 2019, pp. 6990–7001.
  19. L. Yen-Chen, P. Florence, J. T. Barron, T.-Y. Lin, A. Rodriguez, and P. Isola, “NeRF-Supervision: Learning dense object descriptors from neural radiance fields,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022, pp. 6496–6503.
  20. A. Simeonov, Y. Du, A. Tagliasacchi, J. B. Tenenbaum, A. Rodriguez, P. Agrawal, and V. Sitzmann, “Neural Descriptor Fields: SE(3)-equivariant object representations for manipulation,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2022, pp. 6394–6400.
  21. A. Simeonov, Y. Du, L. Yen-Chen, A. Rodriguez, L. P. Kaelbling, T. Lozano-Perez, and P. Agrawal, “SE(3)-Equivariant Relational Rearrangement with Neural Descriptor Fields,” in Conference on Robot Learning (CoRL), 2022, pp. 835–846.
  22. E. Chun, Y. Du, A. Simeonov, T. Lozano-Perez, and L. Kaelbling, “Local Neural Descriptor Fields: Locally Conditioned Object Representations for Manipulation,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2023, pp. 1830–1836.
  23. X. Zhao, R. Hu, P. Guerrero, N. Mitra, and T. Komura, “Relationship templates for creating scene variations,” ACM Transactions on Graphics, vol. 35, pp. 1–13, 2016.
  24. Z. Huang, J. Xu, S. Dai, K. Xu, H. Zhang, H. Huang, and R. Hu, “NIFT: Neural Interaction Field and Template for Object Manipulation,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2023, pp. 1875–1881.
  25. P. Sundaresan, S. Belkhale, D. Sadigh, and J. Bohg, “KITE: Keypoint-Conditioned Policies for Semantic Manipulation,” arXiv:2306.16605, 2023.
  26. W. Gao and R. Tedrake, “kPAM 2.0: Feedback control for category-level robotic manipulation,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 2962–2969, 2021.
  27. J. Jin, L. Petrich, M. Dehghan, and M. Jagersand, “A Geometric Perspective on Visual Imitation Learning,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2020, pp. 5194–5200.
  28. A. Ajoudani, N. G. Tsagarakis, J. Lee, M. Gabiccini, and A. Bicchi, “Natural redundancy resolution in dual-arm manipulation using configuration dependent stiffness (CDS) control,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2014, pp. 1480–1486.
  29. S. Savic, M. Rakovic, B. Borovac, and M. Nikolic, “Hybrid motion control of humanoid robot for leader-follower cooperative tasks,” Thermal Science, vol. 20, pp. 549–561, 2016.
  30. D. Almeida and Y. Karayiannidis, “A Lyapunov-Based Approach to Exploit Asymmetries in Robotic Dual-Arm Task Resolution,” in 2019 IEEE 58th Conference on Decision and Control (CDC), 2019, pp. 4252–4258.
  31. S. S. Mirrazavi Salehian, N. Figueroa, and A. Billard, “A unified framework for coordinated multi-arm motion planning,” The International Journal of Robotics Research, vol. 37, pp. 1205–1232, 2018.
  32. J. Gao, Y. Zhou, and T. Asfour, “Projected Force-Admittance Control for Compliant Bimanual Tasks,” in IEEE/RAS Intl. Conf. on Humanoid Robots (Humanoids), 2018, pp. 607–613.
  33. H. A. Park and C. S. G. Lee, “Extended Cooperative Task Space for manipulation tasks of humanoid robots,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2015, pp. 6088–6093.
  34. J. Lee and P. H. Chang, “Redundancy resolution for dual-arm robots inspired by human asymmetric bimanual action: Formulation and experiments,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2015, pp. 6058–6065.
  35. F. Amadio, A. Colome, and C. Torras, “Exploiting Symmetries in Reinforcement Learning of Bimanual Robotic Tasks,” IEEE Robotics and Automation Letters, vol. 4, pp. 1838–1845, 2019.
  36. È. Pairet, P. Ardón, M. Mistry, and Y. Petillot, “Learning and Composing Primitive Skills for Dual-arm Manipulation,” in Towards Autonomous Robotic Systems - 20th Annual Conference (TAROS), vol. 11649, 2019, pp. 65–77.
  37. G. Franzese, L. d. S. Rosa, T. Verburg, L. Peternel, and J. Kober, “Interactive Imitation Learning of Bimanual Movement Primitives,” IEEE/ASME Transactions on Mechatronics, pp. 1–13, 2023.
  38. Z. Dong, Z. Li, Y. Yan, S. Calinon, and F. Chen, “Passive Bimanual Skills Learning From Demonstration With Motion Graph Attention Networks,” IEEE Robotics and Automation Letters, vol. 7, pp. 4917–4923, 2022.
  39. M. Knaust and D. Koert, “Guided Robot Skill Learning: A User-Study on Learning Probabilistic Movement Primitives with Non-Experts,” in IEEE/RAS Intl. Conf. on Humanoid Robots (Humanoids), 2021, pp. 514–521.
  40. T. Z. Zhao, V. Kumar, S. Levine, and C. Finn, “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware,” in Robotics: Science and Systems (R:SS), 2023.
  41. Z. Fu, T. Z. Zhao, and C. Finn, “Mobile ALOHA: learning bimanual mobile manipulation with low-cost whole-body teleoperation,” arXiv:2401.02117, 2024.
  42. Y. Chen, T. Wu, S. Wang, X. Feng, J. Jiang, S. M. McAleer, H. Dong, Z. Lu, S.-C. Zhu, and Y. Yang, “Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning,” in Neural Information Processing Systems (NeurIPS), vol. 35, 2022, pp. 5150–5163.
  43. S. Kataoka, S. K. S. Ghasemipour, D. Freeman, and I. Mordatch, “Bi-Manual Manipulation and Attachment via Sim-to-Real Reinforcement Learning,” arXiv:2203.08277, 2022.
  44. F. Xie and A. Chowdhury, “Deep Imitation Learning for Bimanual Robotic Manipulation,” in Neural Information Processing Systems (NeurIPS), vol. 33, 2020, pp. 2327–2337.
  45. H. Kim, Y. Ohmura, and Y. Kuniyoshi, “Robot peels banana with goal-conditioned dual-action deep imitation learning,” arXiv:2203.09749¡, 2022.
  46. ——, “Transformer-based deep imitation learning for dual-arm robot manipulation,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2021, pp. 8965–8972.
  47. C. R. G. Dreher, M. Wächter, and T. Asfour, “Learning object-action relations from bimanual human demonstration using graph networks,” IEEE Robotics and Automation Letters (RA-L), vol. 5, no. 1, pp. 187–194, 2020.
  48. K. Meng and A. Eloyan, “Principal manifold estimation via model complexity selection,” Journal of the Royal Statistical Society. Series B, Statistical methodology, vol. 83, no. 2, pp. 369–394, 2021.
  49. Y. Zhou, J. Gao, and T. Asfour, “Learning via-point movement primitives with inter- and extrapolation capabilities,” in IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), 2019, pp. 4301–4308.
  50. H. Xu, J. Zhang, J. Cai, H. Rezatofighi, F. Yu, D. Tao, and A. Geiger, “Unifying Flow, Stereo and Depth Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2023.
  51. T. Müller, A. Evans, C. Schied, and A. Keller, “Instant Neural Graphics Primitives with a Multiresolution Hash Encoding,” ACM Trans. Graph., vol. 41, pp. 102:1–102:15, 2022.
  52. J. Ichnowski, J. Kerr, Y. Avigal, and K. Goldberg, “Dex-NeRF: Using a Neural Radiance Field to Grasp Transparent Objects,” in Conference on Robot Learning (CoRL), vol. 164, 2021, pp. 526–536.
  53. Z. Teed and J. Deng, “RAFT: Recurrent All-Pairs Field Transforms for Optical Flow,” in Euro. Conf. on Computer Vision (ECCV), 2020, pp. 402–419.
  54. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo, P. Dollár, and R. Girshick, “Segment Anything,” in Intl. Conf. on Computer Vision (ICCV), 2023, pp. 3992–4003.
  55. Y. Cheng, L. Li, Y. Xu, X. Li, Z. Yang, W. Wang, and Y. Yang, “Segment and Track Anything,” arXiv:2305.06558, 2023.
  56. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, and L. Zhang, “Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection,” arXiv:2303.05499, 2023.
  57. C. Lugaresi, J. Tang, H. Nash, C. McClanahan, E. Uboweja, M. Hays, F. Zhang, C.-L. Chang, M. G. Yong, J. Lee et al., “MediaPipe: A framework for building perception pipelines,” arXiv:1906.08172, 2019.
  58. T. Jiang, P. Lu, L. Zhang, N. Ma, R. Han, C. Lyu, Y. Li, and K. Chen, “RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose,” arXiv:2303.07399, 2023.
  59. K. Lin, L. Wang, and Z. Liu, “Mesh Graphormer,” in Intl. Conf. on Computer Vision (ICCV), 2021, pp. 12 919–12 928.
  60. J. Lin, A. Zeng, H. Wang, L. Zhang, and Y. Li, “One-Stage 3D Whole-Body Mesh Recovery With Component Aware Transformer,” in Conf. on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 21 159–21 168.
  61. J. Romero, D. Tzionas, and M. J. Black, “Embodied hands: Modeling and capturing hands and bodies together,” ACM Transactions on Graphics, vol. 36, pp. 1–17, 2017.
  62. T. Asfour, M. Wächter, L. Kaul, S. Rader, P. Weiner, S. Ottenhaus, R. Grimm, Y. Zhou, M. Grotz, and F. Paus, “ARMAR-6: A high-performance humanoid for human-robot collaboration in real world scenarios,” IEEE Robotics and Automation Magazine, vol. 26, no. 4, pp. 108–121, 2019.
  63. H.-C. Lin, J. Smith, K. K. Babarahmati, N. Dehio, and M. Mistry, “A projected inverse dynamics approach for multi-arm cartesian impedance control,” in IEEE Intl. Conf. on Robotics and Automation (ICRA), 2018, pp. 5421–5428.
  64. E. Shahriari, S. A. B. Birjandi, and S. Haddadin, “Passivity-based adaptive force-impedance control for modular multi-manual object manipulation,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2194–2201, 2022.
Citations (8)

Summary

We haven't generated a summary for this paper yet.