Diffusion-based Pose Refinement and Muti-hypothesis Generation for 3D Human Pose Estimaiton (2401.04921v1)
Abstract: Previous probabilistic models for 3D Human Pose Estimation (3DHPE) aimed to enhance pose accuracy by generating multiple hypotheses. However, most of the hypotheses generated deviate substantially from the true pose. Compared to deterministic models, the excessive uncertainty in probabilistic models leads to weaker performance in single-hypothesis prediction. To address these two challenges, we propose a diffusion-based refinement framework called DRPose, which refines the output of deterministic models by reverse diffusion and achieves more suitable multi-hypothesis prediction for the current pose benchmark by multi-step refinement with multiple noises. To this end, we propose a Scalable Graph Convolution Transformer (SGCT) and a Pose Refinement Module (PRM) for denoising and refining. Extensive experiments on Human3.6M and MPI-INF-3DHP datasets demonstrate that our method achieves state-of-the-art performance on both single and multi-hypothesis 3DHPE. Code is available at https://github.com/KHB1698/DRPose.
- “Recognizing human actions as the evolution of pose estimation maps,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1159–1168.
- “Vnect: Real-time 3d human pose estimation with a single rgb camera,” Acm transactions on graphics (tog), vol. 36, no. 4, pp. 1–14, 2017.
- Human-computer interaction, Addison-Wesley Longman Ltd., 1994.
- “Stacked hourglass networks for human pose estimation,” in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14. Springer, 2016, pp. 483–499.
- “Cascaded pyramid network for multi-person pose estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7103–7112.
- “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693–5703.
- “3d human pose estimation in video with temporal convolutions and semi-supervised training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 7753–7762.
- “Modulated graph convolutional network for 3d human pose estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11477–11487.
- “Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13232–13242.
- “Poseformerv2: Exploring frequency domain for efficient and robust 3d human pose estimation,” arXiv preprint arXiv:2303.17472, 2023.
- “Htnet: Human topology aware network for 3d human pose estimation,” arXiv preprint arXiv:2302.09790, 2023.
- “Double-chain constraints for 3d human pose estimation in images and videos,” arXiv preprint arXiv:2308.05298, 2023.
- “Probabilistic monocular 3d human pose estimation with normalizing flows,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11199–11208.
- “Graphmdn: Leveraging graph structure and deep learning to solve inverse problems,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–9.
- “Mhformer: Multi-hypothesis transformer for 3d human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13147–13156.
- “Gfpose: Learning 3d human pose prior with gradient fields,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 4800–4810.
- “Denoising diffusion probabilistic models,” Advances in neural information processing systems, vol. 33, pp. 6840–6851, 2020.
- “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020.
- “Diffusion-based 3d human pose estimation with multi-hypothesis aggregation,” arXiv preprint arXiv:2303.11579, 2023.
- “Graformer: Graph-oriented transformer for 3d pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20438–20447.
- “Scalable diffusion models with transformers,” arXiv preprint arXiv:2212.09748, 2022.
- “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
- “Monocular 3d human pose estimation in the wild using improved cnn supervision,” in 2017 international conference on 3D vision (3DV). IEEE, 2017, pp. 506–516.
- “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
- “Weakly supervised generative network for multiple 3d human pose hypotheses,” arXiv preprint arXiv:2008.05770, 2020.
- “Generating multiple hypotheses for 3d human pose estimation with mixture density network,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9887–9895.
- “Monocular 3d human pose estimation by generation and ordinal ranking,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2325–2334.
- Hongbo Kang (4 papers)
- Yong Wang (498 papers)
- Mengyuan Liu (72 papers)
- Doudou Wu (2 papers)
- Peng Liu (372 papers)
- Xinlin Yuan (1 paper)
- Wenming Yang (71 papers)