Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis (2410.22817v2)
Abstract: Generalizable 3D Gaussian splitting (3DGS) can reconstruct new scenes from sparse-view observations in a feed-forward inference manner, eliminating the need for scene-specific retraining required in conventional 3DGS. However, existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes, particularly in non-overlapping and occluded regions. In this paper, we propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis that operates independently of epipolar line constraints. To enhance multiview feature extraction with 3D perception, we employ a selfsupervised Vision Transformer (ViT) with cross-view completion pre-training on large-scale datasets. Additionally, we introduce an Iterative Cross-view Gaussians Alignment method to ensure consistent depth scales across different views. Our eFreeSplat represents an innovative approach for generalizable novel view synthesis. Different from the existing pure geometry-free methods, eFreeSplat focuses more on achieving epipolar-free feature matching and encoding by providing 3D priors through cross-view pretraining. We evaluate eFreeSplat on wide-baseline novel view synthesis tasks using the RealEstate10K and ACID datasets. Extensive experiments demonstrate that eFreeSplat surpasses state-of-the-art baselines that rely on epipolar priors, achieving superior geometry reconstruction and novel view synthesis quality. Project page: https://tatakai1.github.io/efreesplat/.
- Masked Siamese Networks for Label-efficient Learning. CoRR, abs/2204.07141, 2022.
- Mip-NeRF: A Multiscale Representation for Anti-aliasing Neural Radiance Fields. CoRR, abs/2103.13415, 2021a.
- Mip-NeRF 360: Unbounded Anti-aliased Neural Radiance Fields. CoRR, abs/2111.12077, 2021b.
- Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video. CoRR, abs/1908.10553, 2019.
- High Accuracy Optical Flow Estimation Based on a Theory for Warping. In Computer Vision - ECCV 2004, 8th European Conference on Computer Vision, Prague, Czech Republic, May 11-14, 2004. Proceedings, Part IV, pages 25–36. Springer, 2004.
- pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction. CoRR, abs/2312.12337, 2023.
- MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-view Stereo. CoRR, abs/2103.15595, 2021.
- A Survey on 3D Gaussian Splatting. CoRR, abs/2401.03890, 2024.
- MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-view Images. CoRR, abs/2403.14627, 2024.
- GaussianPro: 3D Gaussian Splatting with Progressive Propagation. CoRR, abs/2402.14650, 2024.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. CoRR, abs/2010.11929, 2020.
- Learning to Render Novel Views from Wide-baseline Stereo Pairs. CoRR, abs/2304.08463, 2023.
- CVRecon: Rethinking 3D Geometric Feature Learning For Neural Reconstruction. CoRR, abs/2304.14633, 2023.
- Expavatar: High-fidelity avatar generation of unseen expressions with 3d face priors. ACM Transactions on Multimedia Computing, Communications and Applications, 2024.
- Cascade Cost Volume for High-resolution Multi-view Stereo and Stereo Matching. CoRR, abs/1912.06378, 2019.
- Masked Autoencoders Are Scalable Vision Learners. CoRR, abs/2111.06377, 2021.
- Epipolar Transformers. CoRR, abs/2005.04551, 2020.
- 2D Gaussian Splatting for Geometrically Accurate Radiance Fields. CoRR, abs/2403.17888, 2024.
- FlowFormer: A Transformer Architecture for Optical Flow. CoRR, abs/2203.16194, 2022.
- Leap: Liberate sparse-view 3d modeling from camera poses. arXiv preprint arXiv:2310.01410, 2023.
- GeoNeRF: Generalizing NeRF with Geometry Priors. CoRR, abs/2111.13539, 2021.
- End-to-end Learning of Geometry and Context for Deep Stereo Regression. CoRR, abs/1703.04309, 2017.
- 3D Gaussian Splatting for Real-time Radiance Field Rendering. CoRR, abs/2308.04079, 2023.
- Adam: A Method for Stochastic Optimization. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015.
- Practical Stereo Matching via Cascaded Recurrent Network with Adaptive Correlation. CoRR, abs/2203.11483, 2022.
- Infinite Nature: Perpetual View Generation of Natural Scenes from a Single Image. CoRR, abs/2012.09855, 2020.
- Neural Volumes: Learning Dynamic Renderable Volumes from Images. CoRR, abs/1906.07751, 2019.
- P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10452–10461, 2019a.
- Attention-aware multi-view stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1590–1599, 2020.
- Large language model and domain-specific model collaboration for smart education. Frontiers of Information Technology & Electronic Engineering, 25(3):333–341, 2024.
- Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019b.
- Category-level adversarial adaptation for semantic segmentation using purified features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
- Kill two birds with one stone: Domain generalization for semantic segmentation via network pruning. International Journal of Computer Vision, 2024.
- Reconstructing and simulating dynamic 3d objects with mesh-adsorbed gaussian splatting. arXiv preprint arXiv:2406.01593, 2024.
- A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation. CoRR, abs/1512.02134, 2015.
- Pla4d: Pixel-level alignments for text-to-4d gaussian splatting. arXiv preprint arXiv:2405.19957, 2024.
- NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. CoRR, abs/2003.08934, 2020.
- Entangled view-epipolar information aggregation for generalizable neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4906–4916, 2024.
- Instant Neural Graphics Primitives with a Multiresolution Hash Encoding. CoRR, abs/2201.05989, 2022.
- Open Challenges in Deep Stereo: the Booster Dataset. CoRR, abs/2206.04671, 2022.
- High-resolution Image Synthesis with Latent Diffusion Models. CoRR, abs/2112.10752, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015.
- Scene representation transformer: Geometry-free novel view synthesis through set-latent scene representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6229–6238, 2022.
- High-resolution Stereo Datasets with Subpixel-accurate Ground Truth. In Pattern Recognition - 36th German Conference, GCPR 2014, Münster, Germany, September 2-5, 2014, Proceedings, pages 31–42. Springer, 2014.
- A Multi-view Stereo Benchmark with High-resolution Images and Multi-camera Videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 2538–2547. IEEE Computer Society, 2017.
- DeepVoxels: Learning Persistent 3D Feature Embeddings. CoRR, abs/1812.01024, 2018.
- RoFormer: Enhanced Transformer with Rotary Position Embedding. CoRR, abs/2104.09864, 2021.
- Light Field Neural Rendering. CoRR, abs/2112.09687, 2021.
- Generalizable Patch-based Neural Rendering. CoRR, abs/2207.10662, 2022.
- Splatter image: Ultra-fast single-view 3d reconstruction. arXiv preprint arXiv:2312.13150, 2023.
- Is Attention All That NeRF Needs? In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
- LGM: Large Multi-view Gaussian Model for High-resolution 3D Content Creation. CoRR, abs/2402.05054, 2024.
- Can Scale-consistent Monocular Depth Be Learned in a Self-supervised Scale-invariant Manner? In 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, October 10-17, 2021, pages 12707–12716. IEEE, 2021a.
- Pf-lrm: Pose-free large reconstruction model for joint pose and shape prediction. arXiv preprint arXiv:2311.12024, 2023a.
- IBRNet: Learning Multi-view Image-based Rendering. CoRR, abs/2102.13090, 2021b.
- DUSt3R: Geometric 3D Vision Made Easy. CoRR, abs/2312.14132, 2023b.
- Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process., 13(4):600–612, 2004.
- Masked Feature Prediction for Self-supervised Visual Pre-training. CoRR, abs/2112.09133, 2021.
- CroCo: Self-supervised Pre-training for 3D Vision Tasks by Cross-view Completion. CoRR, abs/2210.10716, 2022.
- CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow. In IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, October 1-6, 2023, pages 17923–17934. IEEE, 2023.
- latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction. CoRR, abs/2403.16292, 2024.
- Unifying Flow, Stereo and Depth Estimation. CoRR, abs/2211.05783, 2022.
- MVSNet: Depth Inference for Unstructured Multi-view Stereo. CoRR, abs/1804.02505, 2018.
- pixelNeRF: Neural Radiance Fields from One or Few Images. CoRR, abs/2012.02190, 2020.
- Plenoxels: Radiance Fields without Neural Networks. CoRR, abs/2112.05131, 2021a.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021b.
- NeRF++: Analyzing and Improving Neural Radiance Fields. CoRR, abs/2010.07492, 2020.
- Gs-lrm: Large reconstruction model for 3d gaussian splatting. arXiv preprint arXiv:2404.19702, 2024.
- The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. CoRR, abs/1801.03924, 2018.
- Zhengyou Zhang. Determining the Epipolar Geometry and its Uncertainty: A Review. Int. J. Comput. Vis., 27(2):161–195, 1998.
- GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis. CoRR, abs/2312.02155, 2023.
- Stereo Magnification: Learning View Synthesis using Multiplane Images. CoRR, abs/1805.09817, 2018.
- Triplane Meets Gaussian Splatting: Fast and Generalizable Single-view 3D Reconstruction with Transformers. CoRR, abs/2312.09147, 2023.