Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
140 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose Reconstruction in a Diffusion Framework (2401.09836v1)

Published 18 Jan 2024 in cs.CV

Abstract: Monocular 3D human pose estimation poses significant challenges due to the inherent depth ambiguities that arise during the reprojection process from 2D to 3D. Conventional approaches that rely on estimating an over-fit projection matrix struggle to effectively address these challenges and often result in noisy outputs. Recent advancements in diffusion models have shown promise in incorporating structural priors to address reprojection ambiguities. However, there is still ample room for improvement as these methods often overlook the exploration of correlation between the 2D and 3D joint-level features. In this study, we propose a novel cross-channel embedding framework that aims to fully explore the correlation between joint-level features of 3D coordinates and their 2D projections. In addition, we introduce a context guidance mechanism to facilitate the propagation of joint graph attention across latent channels during the iterative diffusion process. To evaluate the effectiveness of our proposed method, we conduct experiments on two benchmark datasets, namely Human3.6M and MPI-INF-3DHP. Our results demonstrate a significant improvement in terms of reconstruction accuracy compared to state-of-the-art methods. The code for our method will be made available online for further reference.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. “Choreograph: Music-conditioned automatic dance choreography over a style and tempo consistent dynamic graph,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 3917–3925.
  2. “Fast single-person 2d human pose estimation using multi-task convolutional neural networks,” in ICASSP, 2023, pp. 1–5.
  3. “A dual-masked auto-encoder for robust motion capture with spatial-temporal skeletal token completion,” in Proceedings of the ACM International Conference on Multimedia, 2022, pp. 5123–5131.
  4. “Quickpose: Real-time multi-view multi-person pose estimation in crowded scenes,” in SIGGRAPH, 2022, pp. 1–9.
  5. “Expressive body capture: 3D hands, face, and body from a single image,” in CVPR, 2019, pp. 10975–10985.
  6. “Posetriplet: Co-evolving 3d human pose estimation, imitation, and hallucination under self-supervision,” in CVPR, 2022, pp. 11017–11027.
  7. “Graph stacked hourglass networks for 3d human pose estimation,” in CVPR, 2021, pp. 16105–16114.
  8. “Graformer: Graph-oriented transformer for 3d pose estimation,” in CVPR, 2022, pp. 20438–20447.
  9. “3d human pose estimation in video with temporal convolutions and semi-supervised training,” in CVPR, 2019, pp. 7753–7762.
  10. “Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video,” in CVPR, 2022, pp. 13232–13242.
  11. “Flowpose: Conditional normalizing flows for 3d human pose and shape estimation from monocular videos,” in ICASSP, 2023, pp. 1–5.
  12. “Denoising diffusion implicit models,” in ICLR, 2021, pp. 1–1.
  13. “Denoising diffusion probabilistic models,” Advances in neural information processing systems, pp. 6840–6851, 2020.
  14. “Diffpose: Toward more reliable 3d pose estimation,” in CVPR, 2023, pp. 13041–13051.
  15. “Diffupose: Monocular 3d human pose estimation via denoising diffusion probabilistic model,” arXiv preprint arXiv:2212.02796, 2022.
  16. “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1325–1339, 2013.
  17. “Monocular 3d human pose estimation in the wild using improved cnn supervision,” in 3DV, 2017, pp. 506–516.
  18. “Cascaded deep monocular 3d human pose estimation with evolutionary training data,” in CVPR, 2020, pp. 6173–6183.
  19. “Mhformer: Multi-hypothesis transformer for 3d human pose estimation,” in CVPR, 2022, pp. 13147–13156.
  20. “Cascaded pyramid network for multi-person pose estimation,” in CVPR, 2018, pp. 7103–7112.

Summary

We haven't generated a summary for this paper yet.