Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views (2309.11962v1)
Abstract: We present Ego3DPose, a highly accurate binocular egocentric 3D pose reconstruction system. The binocular egocentric setup offers practicality and usefulness in various applications, however, it remains largely under-explored. It has been suffering from low pose estimation accuracy due to viewing distortion, severe self-occlusion, and limited field-of-view of the joints in egocentric 2D images. Here, we notice that two important 3D cues, stereo correspondences, and perspective, contained in the egocentric binocular input are neglected. Current methods heavily rely on 2D image features, implicitly learning 3D information, which introduces biases towards commonly observed motions and leads to low overall accuracy. We observe that they not only fail in challenging occlusion cases but also in estimating visible joint positions. To address these challenges, we propose two novel approaches. First, we design a two-path network architecture with a path that estimates pose per limb independently with its binocular heatmaps. Without full-body information provided, it alleviates bias toward trained full-body distribution. Second, we leverage the egocentric view of body limbs, which exhibits strong perspective variance (e.g., a significantly large-size hand when it is close to the camera). We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs. Finally, we develop an end-to-end pose reconstruction network that synergizes both techniques. Our comprehensive evaluations demonstrate that Ego3DPose outperforms state-of-the-art models by a pose estimation error (i.e., MPJPE) reduction of 23.1% in the UnrealEgo dataset. Our qualitative results highlight the superiority of our approach across a range of scenarios and challenges.
- MeCap: Whole-Body Digitization for Low-Cost VR/AR Headsets. In Proceedings of the 32nd Annual ACM Symposium on User Interface Software and Technology (New Orleans, LA, USA) (UIST ’19). Association for Computing Machinery, New York, NY, USA, 453–462. https://doi.org/10.1145/3332165.3347889
- UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture. In European Conference on Computer Vision (ECCV).
- Generalizable Human Pose Triangulation. In Proceedings of IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR).
- Adrian Bulat and Georgios Tzimiropoulos. 2016. Human Pose Estimation via Convolutional Part Heatmap Regression. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 717–732.
- Ching-Hang Chen and Deva Ramanan. 2017a. 3D Human Pose Estimation = 2D Pose Estimation + Matching. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 5759–5767. https://doi.org/10.1109/CVPR.2017.610
- C. Chen and D. Ramanan. 2017b. 3D Human Pose Estimation = 2D Pose Estimation + Matching. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5759–5767. https://doi.org/10.1109/CVPR.2017.610
- Xiao Chen and Genke Yang. 2018. Multi-Person Pose Estimation with LIMB Detection Heatmaps. In 2018 25th IEEE International Conference on Image Processing (ICIP). 4078–4082. https://doi.org/10.1109/ICIP.2018.8451055
- Aaron Defazio and Konstantin Mishchenko. 2023. Learning-Rate-Free Learning by D-Adaptation. arXiv:2301.07733 [cs.LG]
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
- Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (Las Vegas, NV, USA) (CVPR ’16). IEEE, 770–778. https://doi.org/10.1109/CVPR.2016.90
- Athlete Pose Estimation by a Global-Local Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Learnable Triangulation of Human Pose. In International Conference on Computer Vision (ICCV).
- Ehsan Jahangiri and Alan L. Yuille. 2017. Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint Detections. In 2017 IEEE International Conference on Computer Vision Workshops (ICCVW). 805–814. https://doi.org/10.1109/ICCVW.2017.100
- Learning Human Pose Estimation Features with Convolutional Networks. CoRR abs/1312.7302 (2013).
- Diederik Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representations (ICLR). San Diega, CA, USA.
- Chen Li and Gim Hee Lee. 2019. Generating multiple hypotheses for 3d human pose estimation with mixture density network. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 9887–9895.
- Sijin Li and Antoni B. Chan. 2014. 3D Human Pose Estimation from Monocular Images with Deep Convolutional Neural Network. In Asian Conference on Computer Vision.
- Maximum-Margin Structured Learning with Deep Networks for 3D Human Pose Estimation. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 2848–2856. https://doi.org/10.1109/ICCV.2015.326
- EgoFish3D: Egocentric 3D Pose Estimation from a Fisheye Camera via Self-Supervised Learning. IEEE Transactions on Multimedia (2023), 1–12. https://doi.org/10.1109/TMM.2023.3242551
- Biomac3D: 2D-to-3D Human Pose Analysis Model for Tele-Rehabilitation Based on Pareto Optimized Deep-Learning Architecture. Applied Sciences 13, 2 (2023). https://doi.org/10.3390/app13021116
- Teppei Miura and Shinji Sako. 2022. Simple yet Effective 3D Ego-Pose Lift-up Based on Vector and Distance for a Mounted Omnidirectional Camera. Applied Intelligence 53, 3 (may 2022), 2616–2628. https://doi.org/10.1007/s10489-022-03417-3
- Francesc Moreno-Noguer. 2016. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 1561–1570.
- F. Moreno-Noguer. 2017. 3D Human Pose Estimation from a Single Image via Distance Matrix Regression. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1561–1570. https://doi.org/10.1109/CVPR.2017.170
- Egocentric Vision-based Action Recognition: A survey. Neurocomputing 472 (2022), 175–197. https://doi.org/10.1016/j.neucom.2021.11.081
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose. In Computer Vision and Pattern Recognition (CVPR).
- Learning to Estimate 3D Human Pose and Shape from a Single Color Image. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 459–468. https://doi.org/10.1109/CVPR.2018.00055
- Cross View Fusion for 3D Human Pose Estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- EgoCap: Egocentric Marker-Less Motion Capture with Two Fisheye Cameras. ACM Trans. Graph. 35, 6, Article 162 (dec 2016), 11 pages. https://doi.org/10.1145/2980179.2980235
- U-Net: Convolutional Networks for Biomedical Image Segmentation. http://arxiv.org/abs/1505.04597 cite arxiv:1505.04597Comment: conditionally accepted at MICCAI 2015.
- Compositional Human Pose Regression. 2017 IEEE International Conference on Computer Vision (ICCV) (2017), 2621–2630.
- Structured Prediction of 3D Human Pose with Deep Neural Networks. In Proceedings of the British Machine Vision Conference (BMVC), Edwin R. Hancock Richard C. Wilson and William A. P. Smith (Eds.). BMVA Press, Article 130, 11 pages. https://doi.org/10.5244/C.30.130
- Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation. 2017 IEEE International Conference on Computer Vision (ICCV) (2016), 3961–3970.
- SelfPose: 3D Egocentric Pose Estimation from a Headset Mounted Camera. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1–1. https://doi.org/10.1109/TPAMI.2020.3029700
- xR-EgoPose: Egocentric 3D Human Pose from an HMD Camera. In Proceedings of the IEEE International Conference on Computer Vision. 7728–7738.
- Efficient object localization using Convolutional Networks.. In CVPR. IEEE Computer Society, 648–656. http://dblp.uni-trier.de/db/conf/cvpr/cvpr2015.html#TompsonGJLB15
- Estimating Egocentric 3D Human Pose in Global Space. arXiv:2104.13454 [cs.CV]
- Mo22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTCap22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT : Real-time Mobile 3D Motion Capture with a Cap-mounted Fisheye Camera. IEEE Transactions on Visualization and Computer Graphics (2019), 1–1. https://doi.org/10.1109/TVCG.2019.2898650
- Kinematic Pose Rectification for Performance Analysis and Retrieval in Sports. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
- Deconvolutional networks. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2528–2535. https://doi.org/10.1109/CVPR.2010.5539957
- Automatic Calibration of the Fisheye Camera for Egocentric 3D Human Pose Estimation from a Single Image. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 1771–1780. https://doi.org/10.1109/WACV48630.2021.00181
- AdaFuse: Adaptive Multiview Fusion for Accurate Human Pose Estimation in the Wild. IJCV (2020), 1–16.
- EgoGlass: Egocentric-View Human Pose Estimation From an Eyeglass Frame. In 2021 International Conference on 3D Vision (3DV). 32–41. https://doi.org/10.1109/3DV53792.2021.00014
- HEMlets PoSh: Learning Part-Centric Heatmap Triplets for 3D Human Pose and Shape Estimation. IEEE Transactions on Pattern Analysis & Machine Intelligence 44, 06 (jun 2022), 3000–3014. https://doi.org/10.1109/TPAMI.2021.3051173