Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer (2404.04819v1)

Published 7 Apr 2024 in cs.CV

Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between humans and objects. There are two core designs in our system: 1) 3D-guided contact estimation and 2) contact-based 3D human and object refinement. First, for accurate human-object contact estimation, CONTHO initially reconstructs 3D humans and objects and utilizes them as explicit 3D guidance for contact estimation. Second, to refine the initial reconstructions of 3D human and object, we propose a novel contact-based refinement Transformer that effectively aggregates human features and object features based on the estimated human-object contact. The proposed contact-based refinement prevents the learning of erroneous correlation between human and object, which enables accurate 3D reconstruction. As a result, our CONTHO achieves state-of-the-art performance in both human-object contact estimation and joint reconstruction of 3D human and object. The code is publicly available at https://github.com/dqj5182/CONTHO_RELEASE.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. 2D human pose estimation: New benchmark and state of the art analysis. In CVPR, 2014.
  2. BEHAVE: Dataset and method for tracking human object interactions. In CVPR, 2022.
  3. Holistic++ scene understanding: Single-view 3D holistic scene parsing and human pose estimation with human-object interaction and physical commonsense. In ICCV, 2019.
  4. Detecting human-object contact in images. In CVPR, 2023.
  5. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022.
  6. Beyond static features for temporally consistent 3D human pose and shape from a video. In CVPR, 2021.
  7. Learning to estimate robust 3D human mesh from in-the-wild crowded scenes. In CVPR, 2022.
  8. Rethinking self-supervised visual representation learning in pre-training for 3D human pose and shape estimation. In ICLR, 2023.
  9. SO-Pose: Exploiting self-occlusion for direct 6D pose estimation. In ICCV, 2021.
  10. Three-dimensional reconstruction of human interactions. In CVPR, 2020.
  11. Learning complex 3D human self-contact. In AAAI, 2021.
  12. Populating 3D scenes by learning human-scene interaction. In CVPR, 2021.
  13. Deep residual learning for image recognition. In CVPR, 2016.
  14. Capturing and inferring dense full-body human-scene contact. In CVPR, 2022a.
  15. InterCap: Joint markerless 3D tracking of humans and objects in interaction. In GCPR, 2022b.
  16. Full-body articulated human-object interaction. In ICCV, 2023.
  17. End-to-end recovery of human shape and pose. In CVPR, 2018.
  18. Adam: A method for stochastic optimization. In ICLR, 2014.
  19. Auto-encoding variational bayes. In ICLR, 2014.
  20. VIBE: Video inference for human body pose and shape estimation. In CVPR, 2020.
  21. PARE: Part attention regressor for 3D human body estimation. In ICCV, 2021.
  22. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV, 2019a.
  23. Convolutional mesh regression for single-image human shape reconstruction. In CVPR, 2019b.
  24. CLIFF: Carrying location information in full frames into human pose and shape estimation. In ECCV, 2022.
  25. Microsoft COCO: Common objects in context. In ECCV, 2014.
  26. SMPL: A skinned multi-person linear model. ACM TOG, 2015.
  27. Accurate 3D hand pose estimation for whole-body 3D human mesh estimation. In CVPRW, 2022a.
  28. 3D clothed human reconstruction in the wild. In ECCV, 2022b.
  29. On self-contact and human pose. In CVPR, 2021.
  30. Cyclic test-time adaptation on monocular video for 3D human mesh reconstruction. In ICCV, 2023.
  31. Extract-and-adaptation network for 3D interacting hand mesh recovery. In ICCVW, 2023.
  32. Automatic differentiation in PyTorch. 2017.
  33. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, 2019.
  34. PVNet: Pixel-wise voting network for 6DoF pose estimation. In CVPR, 2019.
  35. Generating 3D faces using convolutional mesh autoencoders. In ECCV, 2018.
  36. Contact and human dynamics from monocular video. In ECCV, 2020.
  37. HuMoR: 3D human motion model for robust pose estimation. In ICCV, 2021.
  38. PhysCap: Physically plausible monocular 3D motion capture in real time. ACM TOG, 2020.
  39. HULC: 3D human motion capture with pose manifold sampling and dense contact guidance. In ECCV, 2022.
  40. ZebraPose: Coarse to fine surface encoding for 6DoF object pose estimation. In CVPR, 2022.
  41. Integral human pose regression. In ECCV, 2018.
  42. Real-time seamless single shot 6D object pose prediction. In CVPR, 2018.
  43. DECO: Dense estimation of 3D human-scene contact in the wild. In ICCV, 2023.
  44. Attention is all you need. In NeurIPS, 2017.
  45. DenseFusion: 6D object pose estimation by iterative dense fusion. In CVPR, 2019.
  46. Detectron2. https://github.com/facebookresearch/detectron2, 2019.
  47. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. In RSS, 2018.
  48. CHORE: Contact, human and object reconstruction from a single RGB image. In ECCV, 2022.
  49. Visibility aware human-object interaction tracking from single RGB camera. In CVPR, 2023.
  50. RHOBIN Challenge: Reconstruction of human object interaction. arXiv preprint arXiv:2401.04143, 2024.
  51. D3D-HOI: Dynamic 3D human-object interactions from videos. arXiv preprint arXiv:2108.08420, 2021.
  52. Monocular 3D pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In CVPR, 2018.
  53. Visualizing and understanding convolutional networks. In ECCV, 2014.
  54. PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, 2021.
  55. Perceiving 3D human-object spatial arrangements from a single image in the wild. In ECCV, 2020.
  56. Reducing footskate in human motion reconstruction with ground contact constraints. In WACV, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com