Fisheye-Based 3D Gaussian Splatting

Updated 12 August 2025

The paper introduces a novel tangent-plane projection strategy that minimizes error in mapping 3D Gaussians under fisheye distortion.
It employs robust calibration methods with explicit Jacobians and hybrid distortion models to handle the nonlinearities of fisheye imaging.
Performance evaluations demonstrate that the approach surpasses NeRF models, enabling high-quality 3D reconstruction in applications like autonomous driving and VR.

Fisheye-Based 3D Gaussian Splatting is an explicit, rasterization-based scene representation and rendering approach adapted to the challenges posed by fisheye cameras, which provide wide-to-ultra-wide fields of view and introduce strong, nonlinear image distortions. Central to this methodology is the projection and blending of anisotropic 3D Gaussian primitives, with substantial recent work focusing on how to reformulate both forward and backward projection mappings and camera calibration models so that the full image coverage and scene detail provided by fisheye optics can be leveraged in 3D reconstruction and novel view synthesis tasks.

1. Mathematical Foundations and Error Analysis

Standard 3D Gaussian Splatting (3DGS) projects each Gaussian—parameterized by mean $\mu \in \mathbb{R}^3$ and covariance $\Sigma$ —from world coordinates into camera space, and then onto the image plane, typically using a local affine approximation of the projective mapping (via first-order Taylor expansion). The residual error of this affine approximation is given by

$R_1(x') = \varphi(x') - \varphi(\mu') - J(x' - \mu')$

where $\varphi$ is the (potentially non-linear) projection function, $\mu'$ is the Gaussian mean in camera space, and $J$ is the Jacobian of $\varphi$ at $\mu'$ . The mean-squared error over the Gaussian support, $\epsilon(\mu') = \int_X \|R_1(x')\|^2 dx'$ , grows rapidly for Gaussians projected away from the central viewing direction, especially for wide-angle and fisheye lenses (Huang et al., 1 Feb 2024). In such cases, uniform projection onto $z=1$ leads to significant artifacts (blurred, elongated, or "cloud-like" Gaussians) and degraded photorealism.

To minimize these errors, the optimal projection strategy ("Optimal Gaussian Splatting") projects each Gaussian not onto a fixed planar image, but onto the tangent plane to the unit sphere at the direction of the Gaussian’s normalized camera-space mean. The tangent-plane projection is defined by

$\varphi_p(x') = x' \cdot (x_p^\top x')^{-1}$

with $x_p = \mu' / \|\mu'\|$ , and an explicit Jacobian $J_p$ is used for gradient computations. This tangent-plane formulation is inherently independent of pinhole perspective geometry, enabling natural extension to fisheye and other non-pinhole camera models.

2. Fisheye Camera Projection Modeling

Fisheye lenses require specialized projection models due to their non-linear mapping from 3D rays to the image plane. The equidistant model is widely used:

$r_d = f \theta$

where $r_d$ is the radial distance from the optical center, $f$ is the focal length, and $\theta$ is the angle between the incoming ray and the optical axis. For a 3D point $(x_c, y_c, z_c)^\top$ in camera coordinates, the mapping is computed via

$\theta = \arctan(\sqrt{x_c^2 + y_c^2} / z_c)$

$x_p = c_x + \frac{f_x \cdot \theta \cdot x_c}{\sqrt{x_c^2 + y_c^2}}, \quad y_p = c_y + \frac{f_y \cdot \theta \cdot y_c}{\sqrt{x_c^2 + y_c^2}}$

where $(c_x, c_y)$ is the principal point and $(f_x, f_y)$ are focal lengths per axis (Liao et al., 7 Sep 2024). Fisheye-GS modifies only the projection component, recalculating both means and covariances under this non-linear mapping.

During optimization, gradients must be propagated through this projection. The Jacobian $J$ of the equidistant projection is explicitly derived and incorporates all non-linearities of the model, allowing for stable gradient descent and fast training.

Alternatively, some work avoids local Taylor expansion altogether in the presence of strong distortion: 3DGUT applies the Unscented Transform, projecting a set of sigma points from each 3D Gaussian through the non-linear fisheye model, capturing true distributional behavior under heavy distortion (Gunes et al., 9 Aug 2025).

3. Robust Camera and Distortion Calibration

Accurate calibration is critical, as standard Structure-from-Motion (SfM) approaches often fail under the strong distortion found in ultra-wide fisheye images. Self-calibrating 3DGS frameworks now jointly refine camera intrinsics (including distortion), extrinsics, and Gaussian parameters by minimizing a photometric loss across all views. Distortion is modeled via a hybrid combination of explicit grids (for local flexibility) and invertible residual networks (for global regularization). A cubemap-based resampling strategy reprojects fisheye images into six uniformly-sampled 90° cube faces, mitigating the varying pixel density found in standard planar undistortion and minimizing seam artifacts through global distance-based depth sorting (Deng et al., 13 Feb 2025).

For challenging cases where SfM fails, monocular depth map priors such as UniK3D can be used: depth predictions for each ray are backprojected into dense 3D point clouds, circumventing unreliable feature matching in distortion-heavy imagery and enabling robust scene initialization with as few as 2–3 fisheye images per scene (Gunes et al., 9 Aug 2025).

4. Splatting, Rendering, and Forward Model Adaptation

Gaussian Splatting for fisheye images adapts the rasterization or ray-tracing splatting pipeline in accordance with the camera model. In tile-based rasterization, the modified projection ensures each Gaussian is accurately mapped to the 2D plane, including the covariance transformation. Real-time rendering is maintained by efficient CUDA implementations that compute the forward projection and all required gradients with negligible additional computational cost (Liao et al., 7 Sep 2024).

Ray tracing approaches (RaySplats) extend naturally to fisheye modalities: rays are synthesized from each pixel by inverting the fisheye projection, and splatting is replaced by per-ray intersection computations with 3D ellipsoidal "confidence" regions. This method is camera-model agnostic and allows unified, global handling of occlusions, light, shadows, and inter-reflections even in the presence of severe peripheral distortion (Byrski et al., 31 Jan 2025).

5. Performance Evaluation and Comparative Analyses

Empirical studies on real images with fields of view beyond 180° indicate several tradeoffs and practical considerations:

Fisheye-GS (equidistant projection) performs best at or near 160°, with performance tapering at 200° due to projection ambiguities outside its valid range.
3DGUT (Unscented Transform) is robust even at 200°, consistently delivering higher perceptual quality (SSIM, LPIPS) at the periphery.
At small fields of view (120°), Fisheye-GS loses context, and performance degrades, whereas 3DGUT remains stable.
Metric improvements (PSNR, SSIM) are most pronounced in the periphery, with the choice of calibration and initialization playing a critical role (Gunes et al., 9 Aug 2025).

Experiments on datasets with known LIDAR ground truth (FIORD) confirm that explicit 3DGS, when combined with appropriate fisheye handling, outperforms NeRF-style implicit models both quantitatively and in geometric accuracy for large-scale reconstructions (Gunes et al., 2 Apr 2025).

6. Applications, Extensions, and Limitations

Fisheye-based 3DGS enables efficient wide-angle and omnidirectional 3D reconstruction. Applications include autonomous driving, robotics, immersive VR/AR, and large-scale mapping tasks where rapid, full-scene coverage is required. Modularity in the projection layer allows easy extension to other camera models (e.g., panoramic, cubemap), facilitating integration with hybrid sensory pipelines including sonar fusion and LIDAR (Qu et al., 6 Apr 2024, Deng et al., 13 Feb 2025).

Current limitations include:

Sensitivity to extreme distortion near and beyond 180° in equidistant or simplified models.
Residual “floater” artifacts and ambiguities in cases of sparse or low-texture supervision, especially outdoors or in the presence of glare, fog, or sky.
Dependence on robust camera initialization, with monocular depth priors providing a viable route when feature-based SfM is unreliable.

Future directions encompass tighter integration of self-calibrating distortion fields, improved monocular or learned depth priors adapted to fisheye imagery, further development of fish-eye robust regularization and densification strategies, and hybrid approaches that fuse ray-tracing with rasterization for global lighting and high realism in wide-angle settings.

7. Outlook and Research Opportunities

Ongoing developments in the field underline several promising research avenues:

Generalized projection operators that seamlessly interpolate between fisheye, perspective, and panoramic cameras.
Adaptive uncertainty estimation for asset extraction and scene completion, with explicit view-dependent uncertainty tracking over broad angular coverage (Han et al., 10 Apr 2025).
Modular, distortion-adaptive splatting that supports real-time immersive environments and robust 3D mapping from minimal image sets.
Deeper integration with generative and hybrid 3DGS–SDF models to enhance geometry consistency and fine detail, especially under severe lens non-linearities (Gao et al., 21 Jul 2025).

Fisheye-Based 3D Gaussian Splatting is thus distinguished by its robust handling of extreme image distortion, explicit scene geometry, and efficient, real-time rendering, positioning it as a compelling approach for next-generation wide-angle and 360° scene reconstruction and synthesis.