Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

In-Hand 3D Object Reconstruction from a Monocular RGB Video (2312.16425v1)

Published 27 Dec 2023 in cs.CV

Abstract: Our work aims to reconstruct a 3D object that is held and rotated by a hand in front of a static RGB camera. Previous methods that use implicit neural representations to recover the geometry of a generic hand-held object from multi-view images achieved compelling results in the visible part of the object. However, these methods falter in accurately capturing the shape within the hand-object contact region due to occlusion. In this paper, we propose a novel method that deals with surface reconstruction under occlusion by incorporating priors of 2D occlusion elucidation and physical contact constraints. For the former, we introduce an object amodal completion network to infer the 2D complete mask of objects under occlusion. To ensure the accuracy and view consistency of the predicted 2D amodal masks, we devise a joint optimization method for both amodal mask refinement and 3D reconstruction. For the latter, we impose penetration and attraction constraints on the local geometry in contact regions. We evaluate our approach on HO3D and HOD datasets and demonstrate that it outperforms the state-of-the-art methods in terms of reconstruction surface quality, with an improvement of $52\%$ on HO3D and $20\%$ on HOD. Project webpage: https://east-j.github.io/ihor.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Unseen object amodal instance segmentation via hierarchical occlusion modeling. In 2022 International Conference on Robotics and Automation (ICRA), 5085–5092. IEEE.
  2. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Trans. Graph., 28(3): 24.
  3. “What’s This?”-Learning to Segment Unknown Objects from Manipulation Sequences. In 2021 IEEE International Conference on Robotics and Automation (ICRA), 10160–10167. IEEE.
  4. The ycb object and model set: Towards common benchmarks for manipulation research. In 2015 international conference on advanced robotics (ICAR), 510–517. IEEE.
  5. Reconstructing hand-object interactions in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 12417–12426.
  6. Tracking and Reconstructing Hand Object Interactions from Point Cloud Sequences in the Wild. arXiv preprint arXiv:2209.12009.
  7. AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, 231–248. Springer.
  8. Occlusion-aware networks for 3d human pose estimation in video. In Proceedings of the IEEE/CVF international conference on computer vision, 723–732.
  9. Meshlab: an open-source mesh processing tool. In Eurographics Italian chapter conference, volume 2008, 129–136. Salerno, Italy.
  10. Implicit geometric regularization for learning shapes. arXiv preprint arXiv:2002.10099.
  11. A papier-mâché approach to learning 3d surface generation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 216–224.
  12. Grupp, M. 2017. evo: Python package for the evaluation of odometry and SLAM. https://github.com/MichaelGrupp/evo.
  13. In-Hand 3D Object Scanning from an RGB Sequence. arXiv preprint arXiv:2211.16193.
  14. Honnotate: A method for 3d annotation of hand and object poses. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 3196–3206.
  15. Towards unconstrained joint hand-object reconstruction from RGB videos. In 2021 International Conference on 3D Vision (3DV), 659–668. IEEE.
  16. Learning joint reconstruction of hands and manipulated objects. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 11807–11816.
  17. Reconstructing Hand-Held Objects from Monocular Video. In SIGGRAPH Asia 2022 Conference Papers, 1–9.
  18. Grasping field: Learning implicit representations for human grasps. In 2020 International Conference on 3D Vision (3DV), 333–344. IEEE.
  19. Poisson surface reconstruction. In Proceedings of the fourth Eurographics symposium on Geometry processing, volume 7, 0.
  20. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  21. Barf: Bundle-adjusting neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5741–5751.
  22. Marching cubes: A high resolution 3D surface construction algorithm. ACM siggraph computer graphics, 21(4): 163–169.
  23. Mediapipe: A framework for building perception pipelines. arXiv preprint arXiv:1906.08172.
  24. HandTailor: Towards high-precision monocular 3d hand recovery. arXiv preprint arXiv:2102.09244.
  25. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99–106.
  26. Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4): 1–15.
  27. Stacked hourglass networks for human pose estimation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, 483–499. Springer.
  28. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 5589–5599.
  29. Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1496–1505.
  30. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 165–174.
  31. Embodied Hands: Modeling and Capturing Hands and Bodies Together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6).
  32. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition, 4104–4113.
  33. Pixelwise view selection for unstructured multi-view stereo. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part III 14, 501–518. Springer.
  34. AISFormer: Amodal Instance Segmentation with Transformer. arXiv preprint arXiv:2210.06323.
  35. Umeyama, S. 1991. Least-squares estimation of transformation parameters between two point patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 13(04): 376–380.
  36. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. arXiv preprint arXiv:2106.10689.
  37. BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects. arXiv preprint arXiv:2303.14158.
  38. ArtiBoost: Boosting articulated 3d hand-object pose estimation via online exploration and synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2750–2760.
  39. Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34: 4805–4815.
  40. What’s in your hands? 3D Reconstruction of Generic Objects in Hands. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3895–3905.
  41. Rotating without Seeing: Towards In-hand Dexterity through Touch. arXiv preprint arXiv:2303.10880.
  42. Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3784–3792.
  43. InteractionFusion: real-time reconstruction of hand poses and deformable objects in hand-object interactions. ACM Transactions on Graphics (TOG), 38(4): 1–11.
  44. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 15838–15847.
  45. Human de-occlusion: Invisible perception and recovery for humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3691–3701.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com