PixelGaussian: Generalizable 3D Gaussian Reconstruction from Arbitrary Views (2410.18979v1)
Abstract: We propose PixelGaussian, an efficient feed-forward framework for learning generalizable 3D Gaussian reconstruction from arbitrary views. Most existing methods rely on uniform pixel-wise Gaussian representations, which learn a fixed number of 3D Gaussians for each view and cannot generalize well to more input views. Differently, our PixelGaussian dynamically adapts both the Gaussian distribution and quantity based on geometric complexity, leading to more efficient representations and significant improvements in reconstruction quality. Specifically, we introduce a Cascade Gaussian Adapter to adjust Gaussian distribution according to local geometry complexity identified by a keypoint scorer. CGA leverages deformable attention in context-aware hypernetworks to guide Gaussian pruning and splitting, ensuring accurate representation in complex regions while reducing redundancy. Furthermore, we design a transformer-based Iterative Gaussian Refiner module that refines Gaussian representations through direct image-Gaussian interactions. Our PixelGaussian can effectively reduce Gaussian redundancy as input views increase. We conduct extensive experiments on the large-scale ACID and RealEstate10K datasets, where our method achieves state-of-the-art performance with good generalization to various numbers of views. Code: https://github.com/Barrybarry-Smith/PixelGaussian.
- Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields, 2021.
- pixelsplat: 3d gaussian splats from image pairs for scalable generalizable 3d reconstruction. In arXiv, 2023.
- Mvsnerf: Fast generalizable radiance field reconstruction from multi-view stereo. arXiv preprint arXiv:2103.15595, 2021a.
- RegionViT: Regional-to-Local Attention for Vision Transformers. In ArXiv, 2021b.
- Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images. arXiv preprint arXiv:2403.14627, 2024.
- Dpt: Deformable patch-based transformer for visual recognition. In ACM MM. ACM, October 2021c.
- 3d-r2n2: A unified approach for single and multi-view 3d object reconstruction, 2016.
- Transmvsnet: Global context-aware multi-view stereo network with transformers, 2021.
- Cswin transformer: A general vision transformer backbone with cross-shaped windows, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale, 2021.
- Neural radiance flow for 4d view synthesis and video processing. In ICCV, 2021.
- Learning to render novel views from wide-baseline stereo pairs. CVPR, 2023.
- Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps, 2023.
- Plenoxels: Radiance fields without neural networks. In CVPR, 2022.
- Relightable 3d gaussian: Real-time point cloud relighting with brdf decomposition and ray tracing. arXiv:2311.16043, 2023.
- Fastnerf: High-fidelity neural rendering at 200fps. arXiv preprint arXiv:2103.10380, 2021.
- Cascade cost volume for high-resolution multi-view stereo and stereo matching, 2020.
- Deep residual learning for image recognition. In CVPR, 2016.
- Efficientnerf: Efficient neural radiance fields. In CVPR, pp. 12902–12911, June 2022.
- S3superscriptS3\textit{S}^{3}S start_POSTSUPERSCRIPT 3 end_POSTSUPERSCRIPTgaussian: Self-supervised street gaussians for autonomous driving. arXiv preprint arXiv:2405.20323, 2024a.
- Gaussianformer: Scene as gaussians for vision-based 3d semantic occupancy prediction. arXiv preprint arXiv:2405.17429, 2024b.
- Perceiver: General perception with iterative attention, 2021.
- Surfacenet: An end-to-end 3d neural network for multiview stereopsis. In ICCV, pp. 2307–2315, 2017.
- H2-mapping: Real-time dense mapping using hierarchical hybrid representation. IEEE Robotics and Automation Letters, 8(10):6787–6794, October 2023a. ISSN 2377-3774.
- Gaussianshader: 3d gaussian splatting with shading functions for reflective surfaces, 2023b.
- Geonerf: Generalizing nerf with geometry priors. In CVPR, 2022.
- A compact dynamic 3d gaussian representation for real-time dynamic view synthesis, 2024.
- 3d gaussian splatting for real-time radiance field rendering. ACM Transactions on Graphics, 42(4), July 2023.
- Autosplat: Constrained gaussian splatting for autonomous driving scene reconstruction, 2024.
- Neural scene flow fields for space-time view synthesis of dynamic scenes. In CVPR, 2021.
- Infinite nature: Perpetual view generation of natural scenes from a single image. In ICCV.
- Neural sparse voxel fields. NeurIPS, 2020.
- Mvsgaussian: Fast generalizable gaussian splatting reconstruction from multi-view stereo. arXiv preprint arXiv:2405.12218, 2024.
- Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
- Scaffold-gs: Structured 3d gaussians for view-adaptive rendering. In CVPR, pp. 20654–20664, 2024.
- Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
- Gta: A geometry-aware attention mechanism for multi-view transformers. In ICLR, 2024.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
- Donerf: Towards real-time rendering of compact neural radiance fields using depth oracle networks. Computer Graphics Forum, 40:45–59, 2021.
- D-nerf: Neural radiance fields for dynamic scenes. arXiv preprint arXiv:2011.13961, 2020.
- Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps, 2021.
- Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations. CVPR, 2022.
- Deepvoxels: Learning persistent 3d feature embeddings. In Proc. CVPR, 2019.
- Visual parser: Representing part-whole hierarchies with transformers, 2022.
- Splatter image: Ultra-fast single-view 3d reconstruction. In CVPR, 2024.
- Octree generating networks: Efficient convolutional architectures for high-resolution 3d outputs. In ICCV, 2017.
- MonoNeRF: Learning a generalizable dynamic radiance field from monocular videos. In ICCV, October 2023.
- Drivingforward: Feed-forward 3d gaussian splatting for driving scene reconstruction from flexible surround-view input, 2024.
- Neurad: Neural rendering for autonomous driving. arXiv preprint arXiv:2311.15260, 2023.
- Fourier plenoctrees for dynamic radiance field rendering in real-time. In CVPR, pp. 13524–13534, 2022.
- Ibrnet: Learning multi-view image-based rendering. In CVPR, 2021a.
- Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, 2021b.
- Radiance fields for robotic teleoperation, 2024.
- Vision transformer with deformable attention, 2022.
- Pix2vox: Context-aware 3d reconstruction from single and multi-view images. In ICCV, 2019.
- Murf: Multi-baseline radiance fields. In CVPR, 2024.
- Multi-scale 3d gaussian splatting for anti-aliased rendering, 2024.
- Focal self-attention for local-global interactions in vision transformers, 2021.
- R2human: Real-time 3d human appearance rendering from a single image, 2024.
- Mvsnet: Depth inference for unstructured multi-view stereo. In ECCV, 2018.
- PlenOctrees for real-time rendering of neural radiance fields. In ICCV, 2021a.
- pixelNeRF: Neural radiance fields from one or few images. In CVPR, 2021b.
- Vision transformer with progressive sampling, 2021.
- Multi-scale vision longformer: A new vision transformer for high-resolution image encoding, 2021.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Gps-gaussian: Generalizable pixel-wise 3d gaussian splatting for real-time human novel view synthesis. In CVPR, 2024.
- Stereo magnification: Learning view synthesis using multiplane images, 2018.
- Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159, 2020.