3D Visibility-aware Generalizable Neural Radiance Fields for Interacting Hands (2401.00979v1)
Abstract: Neural radiance fields (NeRFs) are promising 3D representations for scenes, objects, and humans. However, most existing methods require multi-view inputs and per-scene training, which limits their real-life applications. Moreover, current methods focus on single-subject cases, leaving scenes of interacting hands that involve severe inter-hand occlusions and challenging view variations remain unsolved. To tackle these issues, this paper proposes a generalizable visibility-aware NeRF (VA-NeRF) framework for interacting hands. Specifically, given an image of interacting hands as input, our VA-NeRF first obtains a mesh-based representation of hands and extracts their corresponding geometric and textural features. Subsequently, a feature fusion module that exploits the visibility of query points and mesh vertices is introduced to adaptively merge features of both hands, enabling the recovery of features in unseen areas. Additionally, our VA-NeRF is optimized together with a novel discriminator within an adversarial learning paradigm. In contrast to conventional discriminators that predict a single real/fake label for the synthesized image, the proposed discriminator generates a pixel-wise visibility map, providing fine-grained supervision for unseen areas and encouraging the VA-NeRF to improve the visual quality of synthesized images. Experiments on the Interhand2.6M dataset demonstrate that our proposed VA-NeRF outperforms conventional NeRFs significantly. Project Page: \url{https://github.com/XuanHuang0/VANeRF}.
- Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 16123–16133.
- pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 5799–5809.
- Tensorf: Tensorial radiance fields. In Proceedings of the European Conference on Computer Vision, 333–350.
- Camera-space hand mesh recovery via semantic aggregation and adaptive 2d-1d registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13274–13283.
- Hand Avatar: Free-Pose Hand Animation and Rendering from Monocular Video. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- AlignSDF: Pose-Aligned Signed Distance Fields for Hand-Object Reconstruction. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part I, 231–248. Springer.
- LISA: Learning implicit shape and appearance of hands. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 20533–20543.
- Depth-supervised nerf: Fewer views and faster training for free. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12882–12891.
- Recurrent 3D Hand Pose Estimation Using Cascaded Pose-guided 3D Alignments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1): 932–945.
- Gram: Generative radiance manifolds for 3d-aware image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10673–10683.
- DART: Articulated Hand Model with Diverse Accessories and Rich Textures. In Advances in Neural Information Processing Systems (Datasets and Benchmarks Track).
- Nerf: Neural radiance field in 3d vision, a comprehensive review. arXiv preprint arXiv:2210.00379.
- Generative adversarial networks. Communications of the ACM, 63(11): 139–144.
- Densepose: Dense human pose estimation in the wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7297–7306.
- HandNeRF: Neural Radiance Fields for Animatable Interacting Hands. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
- Eva3d: Compositional 3d human generation from 2d image collections. International Conference on Learning Representations.
- SHERF: Generalizable Human NeRF from a Single Image. arXiv preprint arXiv:2303.12791.
- InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds. arXiv preprint arXiv:2212.10550.
- GeoNeRF: Generalizing NeRF With Geometry Priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 18365–18375.
- Geonerf: Generalizing nerf with geometry priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18365–18375.
- Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, 694–711.
- A skeleton-driven neural occupancy representation for articulated hands. In International Conference on 3D Vision, 11–21. IEEE.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Weakly-supervised mesh-convolutional hand reconstruction in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4990–5000.
- Neural human performer: Learning generalizable radiance fields for human performance rendering. Advances in Neural Information Processing Systems, 34: 24741–24752.
- SMPL: A skinned multi-person linear model. ACM transactions on graphics (TOG), 34(6): 1–16.
- Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7210–7219.
- 3d interacting hand pose estimation by hand de-occlusion and removal. In Proceedings of the European Conference on Computer Vision, 380–397.
- Which training methods for GANs do actually converge? In International conference on machine learning, 3481–3490.
- KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In Proceedings of the European Conference on Computer Vision, 179–197.
- Nerf in the dark: High dynamic range view synthesis from noisy raw images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16190–16199.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1): 99–106.
- Interhand2.6m: A dataset and baseline for 3d interacting hand pose estimation from a single rgb image. In Proceedings of the European Conference on Computer Vision, 548–564.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics, 41(4): 1–15.
- Stacked hourglass networks for human pose estimation. In Proceedings of the European Conference on Computer Vision, 483–499.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5480–5490.
- Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11453–11464.
- Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 13503–13513.
- Handoccnet: Occlusion-robust 3d hand mesh estimation network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1496–1505.
- Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9054–9063.
- Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988.
- Smplpix: Neural avatars from 3d human models. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 1810–1819.
- Pixel-aligned volumetric avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11733–11742.
- Embodied hands: modeling and capturing hands and bodies together. ACM Transactions on Graphics, 36(6): 1–17.
- Embodied hands: Modeling and capturing hands and bodies together. arXiv preprint arXiv:2201.02610.
- Pifu: Pixel-aligned implicit function for high-resolution clothed human digitization. In Proceedings of the IEEE International Conference on Computer Vision, 2304–2314.
- Image quality assessment through FSIM, SSIM, MSE and PSNR—a comparative study. Journal of Computer and Communications, 7(3): 8–18.
- Graf: Generative radiance fields for 3d-aware image synthesis. Advances in Neural Information Processing Systems, 33: 20154–20166.
- Mega-nerf: Scalable construction of large-scale nerfs for virtual fly-throughs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 12922–12931.
- Ibrnet: Learning multi-view image-based rendering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4690–4699.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4): 600–612.
- Generative occupancy fields for 3d surface-aware image synthesis. Advances in Neural Information Processing Systems, 34: 20683–20695.
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4578–4587.
- Interacting two-hand 3d pose and shape reconstruction from single color image. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11354–11363.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 586–595.
- Monocular real-time hand shape and motion capture using multi-modal data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5346–5355.