GEARS: Local Geometry-aware Hand-object Interaction Synthesis (2404.01758v3)
Abstract: Generating realistic hand motion sequences in interaction with objects has gained increasing attention with the growing interest in digital humans. Prior work has illustrated the effectiveness of employing occupancy-based or distance-based virtual sensors to extract hand-object interaction features. Nonetheless, these methods show limited generalizability across object categories, shapes and sizes. We hypothesize that this is due to two reasons: 1) the limited expressiveness of employed virtual sensors, and 2) scarcity of available training data. To tackle this challenge, we introduce a novel joint-centered sensor designed to reason about local object geometry near potential interaction regions. The sensor queries for object surface points in the neighbourhood of each hand joint. As an important step towards mitigating the learning complexity, we transform the points from global frame to hand template frame and use a shared module to process sensor features of each individual joint. This is followed by a spatio-temporal transformer network aimed at capturing correlation among the joints in different dimensions. Moreover, we devise simple heuristic rules to augment the limited training sequences with vast static hand grasping samples. This leads to a broader spectrum of grasping types observed during training, in turn enhancing our model's generalization capability. We evaluate on two public datasets, GRAB and InterCap, where our method shows superiority over baselines both quantitatively and perceptually.
- Antonio Bicchi. On the closure properties of robotic grasping. The International Journal of Robotics Research, 14(4):319–334, 1995.
- Data-driven grasp synthesis—a survey. IEEE Transactions on Robotics, 30(2):289–309, 2013.
- Contactpose: A dataset of grasps with object contact and hand pose. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIII 16, pages 361–378. Springer, 2020.
- Physically plausible full-body hand-object interaction synthesis. In International Conference on 3D Vision (3DV), 2024.
- Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
- D-grasp: Physically plausible dynamic grasp synthesis for hand-object interactions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Ganhand: Predicting human grasp affordances in multi-object scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5031–5041, 2020.
- 3d objects grasps synthesis: A survey. In 13th World Congress in Mechanism and Machine Science, pages 573–583, 2011.
- Imos: Intent-driven full-body motion synthesis for human-object interactions. In Computer Graphics Forum, pages 1–12. Wiley Online Library, 2023.
- Contactopt: Optimizing contact to improve grasps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1471–1481, 2021.
- Stochastic scene-aware motion prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11374–11384, 2021.
- Learning joint reconstruction of hands and manipulated objects. In CVPR, 2019.
- Hand-object contact consistency reasoning for human grasps generation. arXiv preprint arXiv:2104.03304, 2021a.
- Full-body articulated human-object interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9365–9376, 2023.
- Synergies between affordance and geometry: 6-dof grasp detection via implicit representations. Robotics: science and systems, 2021b.
- Grasping field: Learning implicit representations for human grasps. In 2020 International Conference on 3D Vision (3DV), pages 333–344. IEEE, 2020.
- A skeleton-driven neural occupancy representation for articulated hands. In 2021 International Conference on 3D Vision (3DV), pages 11–21. IEEE, 2021.
- Interaction capture and synthesis. ACM Transactions on Graphics (TOG), 25(3):872–880, 2006.
- Opengrasp: a toolkit for robot grasping simulation. In International Conference on Simulation, Modeling, and Programming for Autonomous Robots, pages 109–120. Springer, 2010.
- Data-driven grasp synthesis using shape matching and task-based pruning. IEEE Transactions on visualization and computer graphics, 13(4):732–747, 2007.
- C Karen Liu. Dextrous manipulation from a grasping pose. In ACM SIGGRAPH 2009 papers, pages 1–6. 2009.
- Contactgen: Generative contact modeling for grasp generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.
- Graspit! a versatile simulator for robotic grasping. IEEE Robotics & Automation Magazine, 11(4):110–122, 2004.
- Generating continual human motion in diverse 3d scenes. In International Conference on 3D Vision (3DV), 2024.
- Contact-invariant optimization for hand manipulation. In Proceedings of the ACM SIGGRAPH/Eurographics symposium on computer animation, pages 137–144, 2012.
- Van-Duc Nguyen. Constructing force-closure grasps. The International Journal of Robotics Research, 7(3):3–16, 1988.
- Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
- An overview of 3d object grasp synthesis algorithms. Robotics and Autonomous Systems, 60(3):326–336, 2012.
- Karun B Shimoga. Robot grasp synthesis algorithms: A survey. The International Journal of Robotics Research, 15(3):230–266, 1996.
- Grab: A dataset of whole-body human grasping of objects. In European Conference on Computer Vision, pages 581–600. Springer, 2020.
- GOAL: Generating 4D whole-body motion for hand-object grasping. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Grip: Generating interaction poses using latent consistency and spatial cues. 2024.
- Flex: Full-body grasping without full-body grasps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21179–21189, 2023.
- Synthesizing long-term 3d human motion and interaction in 3d scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9401–9411, 2021.
- Towards diverse and natural scene-aware 3d human motion synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20460–20469, 2022.
- Saga: Stochastic whole-body grasping with contact. In Proceedings of the European Conference on Computer Vision (ECCV), 2022.
- Unidexgrasp: Universal robotic dexterous grasping via learning diverse proposal generation and goal-conditioned policy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4737–4746, 2023.
- Cpf: Learning a contact potential field to model the hand-object interaction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11097–11106, 2021.
- Synthesis of detailed hand manipulations using contact sampling. ACM Transactions on Graphics (TOG), 31(4):1–10, 2012.
- Manipnet: neural manipulation synthesis with a hand-object spatial representation. ACM Transactions on Graphics (TOG), 40(4):1–14, 2021.
- ArtiGrasp: Physically plausible synthesis of bi-manual dexterous grasping and articulation. In International Conference on 3D Vision (3DV), 2024.
- Couch: Towards controllable human-chair interactions. In European Conference on Computer Vision, pages 518–535. Springer, 2022.
- Compositional human-scene interaction synthesis with semantic control. In European Conference on Computer Vision, pages 311–327. Springer, 2022.
- Robust realtime physics-based motion control for human grasping. ACM Transactions on Graphics (TOG), 32(6):1–12, 2013.
- Cams: Canonicalized manipulation spaces for category-level functional hand-object manipulation synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 585–594, 2023a.
- Coping with the grasping uncertainties in force-closure analysis. The international journal of robotics research, 24(4):311–327, 2005.
- Coop: Decoupling and coupling of whole-body grasping pose generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2163–2173, 2023b.
- Toch: Spatio-temporal object-to-hand correspondence for motion refinement. In European Conference on Computer Vision (ECCV). Springer, 2022.
- Toward human-like grasp: Dexterous grasping via semantic representation of object-hand. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15741–15751, 2021.
- Keyang Zhou (7 papers)
- Bharat Lal Bhatnagar (19 papers)
- Jan Eric Lenssen (31 papers)
- Gerard Pons-Moll (81 papers)