GHNeRF: Learning Generalizable Human Features with Efficient Neural Radiance Fields (2404.06246v1)
Abstract: Recent advances in Neural Radiance Fields (NeRF) have demonstrated promising results in 3D scene representations, including 3D human representations. However, these representations often lack crucial information on the underlying human pose and structure, which is crucial for AR/VR applications and games. In this paper, we introduce a novel approach, termed GHNeRF, designed to address these limitations by learning 2D/3D joint locations of human subjects with NeRF representation. GHNeRF uses a pre-trained 2D encoder streamlined to extract essential human features from 2D images, which are then incorporated into the NeRF framework in order to encode human biomechanic features. This allows our network to simultaneously learn biomechanic features, such as joint locations, along with human geometry and texture. To assess the effectiveness of our method, we conduct a comprehensive comparison with state-of-the-art human NeRF techniques and joint estimation algorithms. Our results show that GHNeRF can achieve state-of-the-art results in near real-time.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In CVPR, pages 5855–5864, 2021.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVRP, 2022.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields. IEEE TPAMI, 2019.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- Gm-nerf: Learning generalizable model-based neural radiance fields from multi-view images. In CVPR, 2023.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR, 2020.
- Mip-NeRF RGB-D: Depth Assisted Fast Neural Radiance Fields. Journal of WSCG, 30:34–43, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
- RMPE: Regional multi-person pose estimation. In ICCV, 2017.
- Alphapose: Whole-body regional multi-person pose estimation and tracking in real-time. IEEE TPAMI, 2022.
- Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7297–7306, 2018.
- Deep residual learning for image recognition. In CVPR, 2016.
- Mask r-cnn. In ICCV, 2017.
- Sherf: Generalizable human nerf from a single image. arXiv preprint arXiv:2303.12791, 2023.
- Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
- Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922–16932, 2023.
- Neuman: Neural human radiance field from a single video. In ECCV. Springer, 2022.
- Pifpaf: Composite fields for human pose estimation. In CVPR, 2019.
- Panoptic neural fields: A semantic object-aware neural scene representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12871–12881, 2022.
- Neural human performer: Learning generalizable radiance fields for human performance rendering. NeurIPS, 34:24741–24752, 2021.
- Efficient neural radiance fields for interactive free-viewpoint video. In SIGGRAPH Asia Conference Proceedings, 2022.
- SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34:248:1–248:16, 2015.
- Smpl: A skinned multi-person linear model. In Seminal Graphics Papers: Pushing the Boundaries, Volume 2, pages 851–866. 2023.
- Biomechanics of human movement and its clinical applications. The Kaohsiung journal of medical sciences, 28(2):S13–S25, 2012.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In CVPR, pages 7210–7219, 2021.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Comm. of the ACM, 65(1):99–106, 2021.
- Camera distance-aware top-down approach for 3d multi-person pose estimation from a single rgb image. In ICCV, 2019.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Knowledge-guided deep fractal neural networks for human pose estimation. IEEE TMM, 2017.
- Nerfies: Deformable neural radiance fields. In CVPR, pages 5865–5874, 2021.
- 3d human pose estimation in video with temporal convolutions and semi-supervised training. In CVPR, 2019.
- Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021.
- Implicit neural representations with structured latent codes for human body modeling. IEEE TPAMI, 2023.
- Dyna: A model of dynamic human shape in motion. ACM Transactions on Graphics, (Proc. SIGGRAPH), 34(4):120:1–120:14, 2015.
- D-nerf: Neural radiance fields for dynamic scenes. In CVPR, pages 10318–10327, 2021.
- LCR-Net++: Multi-person 2D and 3D Pose Detection in Natural Images. IEEE TPAMI, 2019.
- A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. NeurIPS, 2021.
- Dm-nerf: 3d scene geometry decomposition and manipulation from 2d images. arXiv preprint arXiv:2208.07227, 2022.
- Magnify-net for multi-person 2d pose estimation. ICME, 2018.
- Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR, pages 16210–16220, 2022.
- H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. Advances in Neural Information Processing Systems, 34:14955–14966, 2021.
- ViTPose: Simple vision transformer baselines for human pose estimation. In NeurIPS, 2022.
- Featurenerf: Learning generalizable nerfs by distilling foundation models. arXiv preprint arXiv:2303.12786, 2023.
- Plenoctrees for real-time rendering of neural radiance fields. In CVPR, pages 5752–5761, 2021a.
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021b.
- Monohuman: Animatable human neural field from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16943–16953, 2023.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, pages 586–595, 2018.
- Semantic graph convolutional networks for 3d human pose regression. In CVPR, 2019.
- 3d human pose estimation with spatial and temporal transformers. In ICCV, 2021.
- In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15838–15847, 2021.