DirectFisheye-GS: Enabling Native Fisheye Input in Gaussian Splatting with Cross-View Joint Optimization

Published 1 Apr 2026 in cs.CV | (2604.00648v1)

Abstract: 3D Gaussian Splatting (3DGS) has enabled efficient 3D scene reconstruction from everyday images with real-time, high-fidelity rendering, greatly advancing VR/AR applications. Fisheye cameras, with their wider field of view (FOV), promise high-quality reconstructions from fewer inputs and have recently attracted much attention. However, since 3DGS relies on rasterization, most subsequent works involving fisheye camera inputs first undistort images before training, which introduces two problems: 1) Black borders at image edges cause information loss and negate the fisheye's large FOV advantage; 2) Undistortion's stretch-and-interpolate resampling spreads each pixel's value over a larger area, diluting detail density -- causes 3DGS overfitting these low-frequency zones, producing blur and floating artifacts. In this work, we integrate fisheye camera model into the original 3DGS framework, enabling native fisheye image input for training without preprocessing. Despite correct modeling, we observed that the reconstructed scenes still exhibit floaters at image edges: Distortion increases toward the periphery, and 3DGS's original per-iteration random-selecting-view optimization ignores the cross-view correlations of a Gaussian, leading to extreme shapes (e.g., oversized or elongated) that degrade reconstruction quality. To address this, we introduce a feature-overlap-driven cross-view joint optimization strategy that establishes consistent geometric and photometric constraints across views-a technique equally applicable to existing pinhole-camera-based pipelines. Our DirectFisheye-GS matches or surpasses state-of-the-art performance on public datasets.

Abstract PDF Upgrade to Chat

Authors (8)

Summary

The paper introduces a novel integration of a native fisheye camera model into 3D Gaussian Splatting, eliminating lossy preprocessing and preserving peripheral details.
It presents a cross-view joint optimization strategy that enhances geometric consistency and photometric gradient alignment across multiple views.
Experimental results on datasets like FisheyeNeRF and ScanNet++ show improved SSIM, PSNR, and detail preservation in both edge and interior regions.

DirectFisheye-GS: Native Fisheye Camera Integration and Cross-View Optimization for Gaussian Splatting

Introduction and Motivation

DirectFisheye-GS systematically addresses two central deficiencies in existing 3D Gaussian Splatting (3DGS) pipelines for novel view synthesis (NVS) using fisheye cameras: (1) the inability of 3DGS to natively accommodate nonlinear wide-FOV projections without a lossy preprocessing step, and (2) suboptimal optimization stemming from single-view per-iteration updates which neglect spatial and photometric correlations across views. The work presents both a precise analytic formulation for accurate and differentiable fisheye projection and a principled cross-view joint optimization (CVO) strategy for training, resulting in a fully explicit and high-fidelity pipeline compatible with mainstream rasterization-based 3DGS renderers.

Native Fisheye Camera Model Embedding

The core technical innovation entails embedding the Kannala-Brandt polynomial fisheye projection model into the 3DGS rendering and optimization loop, replacing the conventional undistortion process which produces black borders, discards boundary content, and lowers effective spatial detail—detrimental effects particularly severe for wide-angle imagery where scene information is condensed toward the periphery.

Figure 1: Common fisheye camera projection models, including their analytic parameterization relevant for rasterization and differential optimization.

The analytic model supports differentiable forward and inverse mappings essential for end-to-end gradient-based learning. Crucially, the derived Jacobian matrix for the projection allows for precise backpropagation of gradients through the nonlinear transformation, accurately propagating updates to 3D Gaussian means and covariances, especially in regions of nonlinear distortion at large incident angles.

Cross-View Joint Optimization (CVO) Strategy

Standard 3DGS and previously proposed fisheye extensions (e.g., Fisheye-GS, 3DGUT) rely on per-iteration random view selection for stochastic scene coverage, or unsent results by sampling-based covariance propagation which is insufficient in highly distorted scenarios. DirectFisheye-GS instead introduces a camera association graph constructed from explicit multi-view 2D-2D feature correspondences (e.g., SIFT) paired with pose angular divergence heuristics. For each batch, correlated views possessing maximal angular variance and feature overlap are selected for joint optimization.

This design maximizes the likelihood that projected Gaussians correspond to co-visible 3D points, enhancing both geometric consistency and photometric gradient alignment across varying perspectives. The CVO update enforces joint constraints on scale, orientation, SH coefficients, and alpha blending, strongly regularizing model ambiguity particularly at fisheye image borders where single-view gradients are highly anisotropic.

Figure 2: The proposed cross-view joint optimization paradigm contrasts with the single-view updates, promoting geometric and photometric consistency across feature-overlapping, high-diversity views.

Figure 3: Camera association method based on feature overlap and angular divergence, guiding batch sampling for CVO.

Experimental Analysis

Extensive benchmarks are presented on FisheyeNeRF, ScanNet++, and Den-SOFT (spanning object-centric to large-scale, dense VR/AR scenes). DirectFisheye-GS consistently reports either SOTA or competitive metrics against both native and derivative baselines:

On FisheyeNeRF, DirectFisheye-GS attains average SSIM/PSNR/LPIPS scores of 0.8284/26.25/0.2295—matching or exceeding Fisheye-GS and 3DGUT, especially in high-distortion edge regions.
On ScanNet++ test views, a similar trend is observed, with DirectFisheye-GS outperforming dense neural and explicit baselines in both perceptual and structural metrics.
On large-scale Den-SOFT sequences, the method reports clear gains at the challenging boundaries, with sharpness, edge structure, and texture integrity preserved—areas where prior work exhibits excessive blurring, mosaic artifacts, or floating Gaussians.
Figure 4: Qualitative comparison on FisheyeNeRF, demonstrating less floaters, improved detail, and sharper boundaries for DirectFisheye-GS over prior methods.

Figure 5: Distribution of Gaussian scales in FisheyeNeRF-Chairs—CVO eliminates extreme shapes and anomalous scaling at the image periphery.

Specifically, through ablation, the inclusion of CVO in both fisheye and pinhole scenes leads to improved convergence, more uniform and realistic Gaussian parameter distributions, and higher PSNR/SSIM. Toy experiments demonstrate that DirectFisheye-GS delivers more stable, anisotropy-free fits at wide-angle, high-distortion image boundaries, avoiding the typical "mosaic" artifacts seen in 3DGUT. $Figure 6$

Figure 6: Toy distortion experiment—strong fisheye warping leads to gradient misalignment and unstable optimization; native modeling mitigates these issues.

Figure 7: DirectFisheye-GS yields clean, artifact-free reconstructions near fisheye image boundaries; 3DGUT displays geometry degradation and visible discontinuities.

On Den-SOFT, DirectFisheye-GS provides numerically and visually superior results in both boundary and interior regions, a trend found consistent in all large-scale evaluations.

Figure 8: Qualitative outcomes on Den-SOFT, showing high-frequency detail and structural consistency in both indoor and outdoor environments.

Implications for Large-Scale and Real-Time Computer Vision

The ability to natively support arbitrary camera models without sacrificing rasterization-based rendering efficiency obviates the need for destructive preprocessing and allows DirectFisheye-GS to act as a drop-in module for most industrial and research 3DGS pipelines. CVO is model-agnostic and provides a general recipe for robust optimization in settings with strong nonlinear projection, extreme FOV, or dense stereo coverage. The results indicate improved fidelity not just at image centers but also at boundaries, directly impacting applications in immersive VR/AR, SLAM, robotic navigation, and wide-FOV video postproduction.

Notably, the integration preserves compatibility with standard 3DGS viewers and is not restricted by intermediate representations (e.g., no ray-tracing overheads), ensuring scalability and interoperability.

Theoretical and Practical Limitations

While the method exhibits robust empirical and qualitative improvements, the performance gain from CVO under extremely challenging lighting (view-dependent reflectance, refraction) is limited, motivating future work on richer SH modeling, explicit reflectance parameterization, or hybrid rasterization/ray-tracing approaches. Additionally, the dependency on structure-from-motion for feature association may be augmented with improved matching or semantic information for further robustness.

Figure 9: Ablation of cross-view joint optimization on different camera models, emphasizing the universality of the proposed strategy.

Conclusion

DirectFisheye-GS presents an explicit, differentiable solution for high-fidelity NVS with native fisheye images, underpinned by analytic camera modeling and cross-view optimization. The proposed approach achieves high rendering quality, efficient training dynamics, and maintains architectural compatibility with established explicit representations. CVO offers a general augmentation for all explicit multiview pipelines, not limited to Gaussians or splatting, and the analytic gradient propagation through nonlinear projections sets a methodological baseline for future research in wide-FOV and nonpinhole imaging. This work establishes an extensible framework, opening directions for integrating advanced view-dependent effects, hybrid camera types, and further advances in real-time neural rendering.

Markdown Report Issue