Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation (2312.06386v2)

Published 11 Dec 2023 in cs.CV, cs.AI, and cs.LG

Abstract: We propose ManiPose, a manifold-constrained multi-hypothesis model for human-pose 2D-to-3D lifting. We provide theoretical and empirical evidence that, due to the depth ambiguity inherent to monocular 3D human pose estimation, traditional regression models suffer from pose-topology consistency issues, which standard evaluation metrics (MPJPE, P-MPJPE and PCK) fail to assess. ManiPose addresses depth ambiguity by proposing multiple candidate 3D poses for each 2D input, each with its estimated plausibility. Unlike previous multi-hypothesis approaches, ManiPose forgoes generative models, greatly facilitating its training and usage. By constraining the outputs to lie on the human pose manifold, ManiPose guarantees the consistency of all hypothetical poses, in contrast to previous works. We showcase the performance of ManiPose on real-world datasets, where it outperforms state-of-the-art models in pose consistency by a large margin while being very competitive on the MPJPE metric.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (43)
  1. Christopher M Bishop. Mixture density networks. 1994.
  2. Exploiting spatial-temporal relationships for 3d pose estimation via graph convolutional networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2272–2281, 2019.
  3. 3d human pose estimation= 2d pose estimation+ matching. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7035–7043, 2017.
  4. Anatomy-aware 3d human pose estimation with bone-based pose decomposition. IEEE Transactions on Circuits and Systems for Video Technology, 32(1):198–209, 2021.
  5. Cascaded pyramid network for multi-person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7103–7112, 2018.
  6. Diversenet: When one right answer is not enough. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5598–5607, 2018.
  7. Multiple choice learning: Learning to produce multiple structured outputs. Advances in neural information processing systems, 25, 2012.
  8. Diffpose: Multi-hypothesis human pose estimation using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15977–15987, 2023.
  9. Mir Rayat Imtiaz Hossain and James J Little. Exploiting temporal information for 3d human pose estimation. In Proceedings of the European conference on computer vision (ECCV), pages 68–84, 2018.
  10. Conditional directed graph convolution for 3d human pose estimation. In Proceedings of the 29th ACM International Conference on Multimedia, pages 602–611, 2021.
  11. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
  12. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  13. Probabilistic modeling for human mesh recovery. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11605–11614, 2021.
  14. Confident multiple choice learning. In International Conference on Machine Learning, pages 2014–2023. PMLR, 2017.
  15. Why m heads are better than one: Training a diverse ensemble of deep networks. arXiv preprint arXiv:1511.06314, 2015.
  16. Stochastic multiple choice learning for training diverse deep ensembles. Advances in Neural Information Processing Systems, 29, 2016.
  17. Resilient multiple choice learning: A learned scoring scheme with application to audio scene analysis. In Advances in Neural Information Processing Systems, 2023.
  18. Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9887–9895, 2019.
  19. Weakly supervised generative network for multiple 3d human pose hypotheses. In British Machine Vision Conference (BMVC), 2020.
  20. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3383–3393, 2021.
  21. Mhformer: Multi-hypothesis transformer for 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13147–13156, 2022.
  22. Attention mechanism exploits temporal contexts: Real-time 3d human pose reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5064–5073, 2020.
  23. Overcoming limitations of mixture density networks: A sampling and fitting framework for multimodal future prediction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7144–7153, 2019.
  24. A simple yet effective baseline for 3d human pose estimation. In Proceedings of the IEEE international conference on computer vision, pages 2640–2649, 2017.
  25. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 2017 international conference on 3D vision (3DV), pages 506–516. IEEE, 2017.
  26. Learning to specialize with knowledge distillation for visual question answering. Advances in neural information processing systems, 31, 2018.
  27. A mathematical introduction to robotic manipulation. CRC press, 2017.
  28. Graphmdn: Leveraging graph structure and deep learning to solve inverse problems. In 2021 International Joint Conference on Neural Networks (IJCNN), pages 1–9. IEEE, 2021.
  29. 3d human pose estimation in video with temporal convolutions and semi-supervised training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7753–7762, 2019.
  30. DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3220–3229, 2023.
  31. Learning in an uncertain world: Representing ambiguity through multiple hypotheses. In Proceedings of the IEEE international conference on computer vision, pages 3591–3600, 2017.
  32. P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation, 2022.
  33. Monocular 3d human pose estimation by generation and ordinal ranking. In Proceedings of the IEEE/CVF international conference on computer vision, pages 2325–2334, 2019.
  34. Versatile multiple choice learning and its application to vision computing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6349–6357, 2019.
  35. Motion guided 3d pose estimation from videos. In European Conference on Computer Vision, pages 764–780. Springer, 2020.
  36. Michael L. Waskom. seaborn: statistical data visualization. Journal of Open Source Software, 6(60):3021, 2021.
  37. Probabilistic monocular 3d human pose estimation with normalizing flows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 11199–11208, 2021.
  38. Deep kinematics analysis for monocular 3d human pose estimation. In Proceedings of the IEEE/CVF Conference on computer vision and Pattern recognition, pages 899–908, 2020.
  39. Graph stacked hourglass networks for 3d human pose estimation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16105–16114, 2021.
  40. Mixste: Seq2seq mixed spatio-temporal encoder for 3d human pose estimation in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13232–13242, 2022.
  41. 3D Human Pose Estimation with Spatial and Temporal Transformers. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 11636–11645, Montreal, QC, Canada, 2021. IEEE.
  42. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
  43. Modulated graph convolutional network for 3d human pose estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11477–11487, 2021.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com