Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis (2404.12383v1)

Published 18 Apr 2024 in cs.CV

Abstract: We propose G-HOP, a denoising diffusion based generative prior for hand-object interactions that allows modeling both the 3D object and a human hand, conditioned on the object category. To learn a 3D spatial diffusion model that can capture this joint distribution, we represent the human hand via a skeletal distance field to obtain a representation aligned with the (latent) signed distance field for the object. We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis. We believe that our model, trained by aggregating seven diverse real-world interaction datasets spanning across 155 categories, represents a first approach that allows jointly generating both hand and object. Our empirical evaluations demonstrate the benefit of this joint prior in video-based reconstruction and human grasp synthesis, outperforming current task-specific baselines. Project website: https://judyye.github.io/ghop-www

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. Task-oriented hand motion retargeting for dexterous manipulation imitation. In ECCV Workshops, 2018.
  2. Contactgrasp: Functional multi-finger grasp synthesis from contact. In IROS, 2019.
  3. Contactpose: A dataset of grasps with object contact and hand pose. In ECCV, 2020.
  4. The ycb object and model set: Towards common benchmarks for manipulation research. In ICAR, 2015.
  5. DexYCB: A benchmark for capturing hand grasping of objects. In CVPR, 2021.
  6. Alignsdf: Pose-aligned signed distance fields for hand-object reconstruction. In ECCV, 2022.
  7. gsdf: Geometry-driven signed distance functions for 3d hand-object reconstruction. In CVPR, 2023.
  8. Sdfusion: Multimodal 3d shape completion, reconstruction, and generation. In CVPR, 2023.
  9. Ganhand: Predicting human grasp affordances in multi-object scenes. In CVPR, 2020.
  10. Pybullet, a python module for physics simulation for games, robotics and machine learning. http://pybullet.org, 2016–2021.
  11. Scaling egocentric vision: The epic-kitchens dataset. In ECCV, 2018.
  12. Nerdi: Single-view nerf synthesis with language-guided diffusion as general image priors. CVPR, 2023.
  13. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 2018.
  14. Handrix: animating the human hand. In SCA, 2003.
  15. Demo2vec: Reasoning object affordances from online videos. In CVPR, 2018.
  16. First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In CVPR, 2018.
  17. Contactopt: Optimizing contact to improve grasps. CVPR, 2021.
  18. Honnotate: A method for 3d annotation of hand and object poses. In CVPR, 2020.
  19. In-hand 3d object scanning from an rgb sequence. CVPR, 2023.
  20. Learning joint reconstruction of hands and manipulated objects. In CVPR, 2019.
  21. Denoising diffusion probabilistic models. NeurIPS, 2020a.
  22. Denoising diffusion probabilistic models. NeurIPS, 2020b.
  23. Reconstructing hand-held objects from monocular video. In SIGGRAPH Asia, 2022.
  24. Hand-object contact consistency reasoning for human grasps generation. In ICCV, 2021.
  25. Shap-e: Generating conditional 3d implicit functions. arXiv, 2023.
  26. Grasping field: Learning implicit representations for human grasps. In 3DV, 2020.
  27. Guided motion diffusion for controllable human motion synthesis. In ICCV, 2023.
  28. Physics-based hand interaction with virtual objects. In ICRA, 2015.
  29. Adam: A method for stochastic optimization. ICLR, 2015.
  30. Data-driven grasp synthesis using shape matching and task-based pruning. IEEE Transactions on visualization and computer graphics, 2007.
  31. Magic3d: High-resolution text-to-3d content creation. CVPR, 2023.
  32. Zero-1-to-3: Zero-shot one image to 3d object. CVPR, 2023a.
  33. Joint hand motion and interaction hotspots prediction from egocentric videos. In CVPR, 2022a.
  34. Contactgen: Generative contact modeling for grasp generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023b.
  35. Hoi4d: A 4d egocentric dataset for category-level human-object interaction. In CVPR, 2022b.
  36. Learning ambidextrous robot grasping policies. Science Robotics, 2019.
  37. Realfusion: 360° reconstruction of any object from a single image. In CVPR, 2023.
  38. Grounded human-object interaction hotspots from video. In ICCV, 2019.
  39. OpenAI. Gpt-4 technical report, 2023.
  40. Dreamfusion: Text-to-3d using 2d diffusion. ICLR, 2022.
  41. Learning hand-held object reconstruction from in-the-wild videos. arXiv preprint arXiv:2305.03036, 2023.
  42. Learning transferable visual models from natural language supervision. In ICML, 2021.
  43. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  44. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  45. Embodied hands: Modeling and capturing hands and bodies together. SIGGRAPH Asia, 2017.
  46. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, 2023.
  47. Make-a-video: Text-to-video generation without text-video data. ICLR, 2023.
  48. Grab: A dataset of whole-body human grasping of objects. In ECCV, 2020.
  49. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In CVPR, 2019.
  50. Human motion diffusion model. In ICLR, 2023.
  51. Neural discrete representation learning. NeurIPS, 2017.
  52. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In CVPR, 2023.
  53. Bundlesdf: Neural 6-dof tracking and 3d reconstruction of unknown objects. In CVPR, 2023.
  54. Group normalization. In ECCV, 2018.
  55. OakInk: A large-scale knowledge repository for understanding hand-object interaction. In CVPR, 2022.
  56. What’s in your hands? 3d reconstruction of generic objects in hands. In CVPR, 2022.
  57. Diffusion-guided reconstruction of everyday hand-object interaction clips. In ICCV, 2023a.
  58. Affordance diffusion: Synthesizing hand-object interactions. In CVPR, 2023b.
  59. Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Yufei Ye (16 papers)
  2. Abhinav Gupta (178 papers)
  3. Kris Kitani (96 papers)
  4. Shubham Tulsiani (71 papers)
Citations (9)

Summary

We haven't generated a summary for this paper yet.