CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs (2403.16885v1)
Abstract: Neural Radiance Fields (NeRF) have shown impressive capabilities for photorealistic novel view synthesis when trained on dense inputs. However, when trained on sparse inputs, NeRF typically encounters issues of incorrect density or color predictions, mainly due to insufficient coverage of the scene causing partial and sparse supervision, thus leading to significant performance degradation. While existing works mainly consider ray-level consistency to construct 2D learning regularization based on rendered color, depth, or semantics on image planes, in this paper we propose a novel approach that models 3D spatial field consistency to improve NeRF's performance with sparse inputs. Specifically, we first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space. We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering. By backpropagating through the rendering loss, we enhance the consistency among neighboring points. Additionally, we propose to use a contrastive loss on the encoder output of the Transformer to further improve consistency within each voxel. Experiments demonstrate that our method yields significant improvement over different radiance fields in the sparse inputs setting, and achieves comparable performance with current works.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. In ICCV, 2021.
- Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In CVPR, 2022.
- Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. In ICCV, 2021.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- Depth-supervised nerf: Fewer views and faster training for free. In CVPR, 2022.
- A papier-mâché approach to learning 3d surface generation. In CVPR, 2018.
- Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In ICML, 2023.
- Denoising diffusion probabilistic models. In NeurIPS, 2020.
- Consistentnerf: Enhancing neural radiance fields with 3d consistency for sparse view synthesis. arXiv preprint arXiv:2305.11031, 2023.
- Putting nerf on a diet: Semantically consistent few-shot view synthesis. In ICCV, 2021.
- Large scale multi-view stereopsis evaluation. In CVPR, 2014.
- Geonerf: Generalizing nerf with geometry priors. In CVPR, 2022.
- Infonerf: Ray entropy minimization for few-shot neural volume rendering. In CVPR, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Nerf-vae: A geometry aware 3d scene generative model. In ICML, 2021.
- Barf: Bundle-adjusting neural radiance fields. In ICCV, 2021.
- Efficient neural radiance fields for interactive free-viewpoint video. In SIGGRAPH, 2022.
- Neural sparse voxel fields. In NeurIPS, 2020.
- Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.
- Sparseneus: Fast generalizable neural surface reconstruction from sparse views. In ECCV, 2022.
- Occupancy networks: Learning 3d reconstruction in function space. In CVPR, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Giraffe: Representing scenes as compositional generative neural feature fields. In CVPR, 2021.
- Differentiable volumetric rendering: Learning implicit 3d representations without 3d supervision. In CVPR, 2020.
- Regnerf: Regularizing neural radiance fields for view synthesis from sparse inputs. In CVPR, 2022.
- Deepsdf: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
- Diner: Depth-aware image-based neural radiance fields. In CVPR, 2023.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In CVPR, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In NeurIPS, 2017b.
- Learning transferable visual models from natural language supervision. In ICML, 2021.
- Dense depth priors for neural radiance fields from sparse input views. In CVPR, 2022.
- High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
- Graf: Generative radiance fields for 3d-aware image synthesis. In NeurIPS, 2020.
- Flipnerf: Flipped reflection rays for few-shot novel view synthesis. In ICCV, 2023.
- Deepvoxels: Learning persistent 3d feature embeddings. In CVPR, 2019a.
- Scene representation networks: Continuous 3d-structure-aware neural scene representations. In NeurIPS, 2019b.
- Kihyuk Sohn. Improved deep metric learning with multi-class n-pair loss objective. In NeurIPS, 2016.
- Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In ICCV, 2017.
- Learning accurate dense correspondences and when to trust them. In CVPR, 2021.
- Sparf: Neural radiance fields from sparse and noisy poses. In CVPR, 2023.
- Attention is all you need. In NeurIPS, 2017.
- Ref-nerf: Structured view-dependent appearance for neural radiance fields. In CVPR, 2022.
- Sparsenerf: Distilling depth ranking for few-shot novel view synthesis. In ICCV, 2023.
- Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. In NeurIPS, 2021a.
- Ibrnet: Learning multi-view image-based rendering. In CVPR, 2021b.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004.
- Nerfbusters: Removing ghostly artifacts from casually captured nerfs. arXiv preprint arXiv:2304.10532, 2023.
- Novel view synthesis with diffusion models. arXiv preprint arXiv:2210.04628, 2022.
- 3d shapenets: A deep representation for volumetric shapes. In CVPR, 2015.
- Freenerf: Improving few-shot neural rendering with free frequency regularization. In CVPR, 2023.
- pixelnerf: Neural radiance fields from one or few images. In CVPR, 2021.
- Monosdf: Exploring monocular geometric cues for neural implicit surface reconstruction. In NeurIPS, 2022.
- Nerf++: Analyzing and improving neural radiance fields. arXiv preprint arXiv:2010.07492, 2020.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In CVPR, 2023.
- Yingji Zhong (5 papers)
- Lanqing Hong (72 papers)
- Zhenguo Li (195 papers)
- Dan Xu (120 papers)