Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distribution and Depth-Aware Transformers for 3D Human Mesh Recovery (2403.09063v1)

Published 14 Mar 2024 in cs.CV and cs.AI

Abstract: Precise Human Mesh Recovery (HMR) with in-the-wild data is a formidable challenge and is often hindered by depth ambiguities and reduced precision. Existing works resort to either pose priors or multi-modal data such as multi-view or point cloud information, though their methods often overlook the valuable scene-depth information inherently present in a single image. Moreover, achieving robust HMR for out-of-distribution (OOD) data is exceedingly challenging due to inherent variations in pose, shape and depth. Consequently, understanding the underlying distribution becomes a vital subproblem in modeling human forms. Motivated by the need for unambiguous and robust human modeling, we introduce Distribution and depth-aware human mesh recovery (D2A-HMR), an end-to-end transformer architecture meticulously designed to minimize the disparity between distributions and incorporate scene-depth leveraging prior depth information. Our approach demonstrates superior performance in handling OOD data in certain scenarios while consistently achieving competitive results against state-of-the-art HMR methods on controlled datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. Y. Xiu, J. Yang, X. Cao, D. Tzionas, and M. J. Black, “Econ: Explicit clothed humans optimized via normal integration,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 512–523.
  2. S. Peng, Y. Zhang, Y. Xu, Q. Wang, Q. Shuai, H. Bao, and X. Zhou, “Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9054–9063.
  3. H. Yi, C.-H. P. Huang, S. Tripathi, L. Hering, J. Thies, and M. J. Black, “Mime: Human-aware 3d scene generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 12 965–12 976.
  4. A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7122–7131.
  5. A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik, “Learning 3d human dynamics from video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5614–5623.
  6. H. Cho, Y. Cho, J. Ahn, and J. Kim, “Implicit 3d human mesh recovery using consistency with pose and shape from unseen-view,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 148–21 158.
  7. H. Choi, G. Moon, and K. M. Lee, “Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16.   Springer, 2020, pp. 769–787.
  8. K. Lin, L. Wang, and Z. Liu, “End-to-end human pose and mesh reconstruction with transformers,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1954–1963.
  9. M. Kocabas, N. Athanasiou, and M. J. Black, “Vibe: Video inference for human body pose and shape estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5253–5263.
  10. N. Kolotouros, G. Pavlakos, M. J. Black, and K. Daniilidis, “Learning to reconstruct 3d human pose and shape via model-fitting in the loop,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2252–2261.
  11. M. Kocabas, C.-H. P. Huang, O. Hilliges, and M. J. Black, “Pare: Part attention regressor for 3d human body estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 127–11 137.
  12. N. Kolotouros, G. Pavlakos, D. Jayaraman, and K. Daniilidis, “Probabilistic modeling for human mesh recovery,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 605–11 614.
  13. A. Zanfir, E. G. Bazavan, H. Xu, W. T. Freeman, R. Sukthankar, and C. Sminchisescu, “Weakly supervised 3d human pose and shape reconstruction with normalizing flows,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16.   Springer, 2020, pp. 465–481.
  14. P. Kirichenko, P. Izmailov, and A. G. Wilson, “Why normalizing flows fail to detect out-of-distribution data,” ArXiv, vol. abs/2006.08545, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219687356
  15. F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it smpl: Automatic estimation of 3d human pose and shape from a single image,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14.   Springer, 2016, pp. 561–578.
  16. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “Smpl: A skinned multi-person linear model,” in Seminal Graphics Papers: Pushing the Boundaries, Volume 2, 2023, pp. 851–866.
  17. N. Zioulis and J. F. O’Brien, “Kbody: Towards general, robust, and aligned monocular whole-body estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6214–6224.
  18. H. Cho, J. Ahn, Y. Cho, and J. Kim, “Video inference for human mesh recovery with vision transformer,” in 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), 2023, pp. 1–6.
  19. N. Kolotouros, G. Pavlakos, and K. Daniilidis, “Convolutional mesh regression for single-image human shape reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 4501–4510.
  20. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52–67.
  21. N. Verma, E. Boyer, and J. Verbeek, “Feastnet: Feature-steered graph convolutions for 3d shape analysis,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2598–2606.
  22. G. Moon and K. M. Lee, “I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16.   Springer, 2020, pp. 752–768.
  23. B. Biggs, D. Novotny, S. Ehrhardt, H. Joo, B. Graham, and A. Vedaldi, “3d multi-bodies: Fitting sets of plausible 3d human models to ambiguous image data,” Advances in Neural Information Processing Systems, vol. 33, pp. 20 496–20 507, 2020.
  24. J. Li, S. Bian, A. Zeng, C. Wang, B. Pang, W. Liu, and C. Lu, “Human pose regression with residual log-likelihood estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 025–11 034.
  25. Z. Shen, Z. Cen, S. Peng, Q. Shuai, H. Bao, and X. Zhou, “Learning human mesh recovery in 3d scenes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 17 038–17 047.
  26. J. Li, Z. Yang, X. Wang, J. Ma, C. Zhou, and Y. Yang, “Jotr: 3d joint contrastive learning with transformers for occluded human mesh recovery,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 9110–9121.
  27. Z. Qiu, Q. Yang, J. Wang, H. Feng, J. Han, E. Ding, C. Xu, D. Fu, and J. Wang, “Psvt: End-to-end multi-person 3d pose and shape estimation with progressive video transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 254–21 263.
  28. J. Lin, A. Zeng, H. Wang, L. Zhang, and Y. Li, “One-stage 3d whole-body mesh recovery with component aware transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 21 159–21 168.
  29. L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real NVP,” CoRR, vol. abs/1605.08803, 2016. [Online]. Available: http://arxiv.org/abs/1605.08803
  30. S. Lin, L. Yang, I. Saleemi, and S. Sengupta, “Robust high-resolution video matting with temporal guidance,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 238–247.
  31. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), J. Burstein, C. Doran, and T. Solorio, Eds.   Minneapolis, Minnesota: Association for Computational Linguistics, Jun. 2019, pp. 4171–4186. [Online]. Available: https://aclanthology.org/N19-1423
  32. D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell, and A. A. Efros, “Context encoders: Feature learning by inpainting,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2536–2544.
  33. H. Choi, G. Moon, J. Y. Chang, and K. M. Lee, “Beyond static features for temporally consistent 3d human pose and shape from a video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1964–1973.
  34. M. Kocabas, C.-H. P. Huang, J. Tesch, L. Müller, O. Hilliges, and M. J. Black, “Spec: Seeing people in the wild with an estimated camera,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 035–11 045.
  35. H. Zhang, Y. Tian, X. Zhou, W. Ouyang, Y. Liu, L. Wang, and Z. Sun, “Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 446–11 456.
  36. Y. Sun, Q. Bao, W. Liu, Y. Fu, B. Michael J., and T. Mei, “Monocular, One-stage, Regression of Multiple 3D People,” in ICCV, 2021.
  37. H. Joo, N. Neverova, and A. Vedaldi, “Exemplar fine-tuning for 3d human pose fitting towards in-the-wild 3d human pose estimation,” in 3DV, 2020.
  38. T. Von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, and G. Pons-Moll, “Recovering accurate 3d human pose in the wild using imus and a moving camera,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 601–617.
  39. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325–1339, 2014.
  40. S. Johnson and M. Everingham, “Clustered pose and nonlinear appearance models for human pose estimation.” in bmvc, vol. 2, no. 4.   Aberystwyth, UK, 2010, p. 5.
  41. J. Bright, Y. Chen, and J. Zelek, “Mitigating motion blur for robust 3d baseball player pose modeling for pitch analysis,” 2023.
  42. M. Fani, H. Neher, D. A. Clausi, A. Wong, and J. Zelek, “Hockey action recognition via integrated stacked hourglass network,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2017, pp. 85–93.
  43. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
  44. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13.   Springer, 2014, pp. 740–755.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jerrin Bright (9 papers)
  2. Bavesh Balaji (5 papers)
  3. Harish Prakash (3 papers)
  4. Yuhao Chen (84 papers)
  5. David A Clausi (26 papers)
  6. John Zelek (31 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.