EndoGaussians: 3D Gaussian Splatting for Endoscopy

Updated 7 March 2026

EndoGaussians is a framework that uses explicit 3D Gaussian splats to reconstruct deformable tissues from endoscopic RGBD video streams.
It achieves real-time performance and improved accuracy through optimized, learned deformation models and physical depth priors.
The method integrates video inpainting, point-cloud initialization, and hallucination masking to enhance interpretability for intraoperative visualization and simulation.

EndoGaussians is an explicit 3D Gaussian splatting framework for dynamic, single-view reconstruction of deformable tissues from endoscopic RGBD video streams. Addressing technical and practical limitations of preceding neural radiance field (NeRF)-based techniques, EndoGaussians advances accuracy, interpretability, and real-time performance in the 3D modeling of soft tissues under non-rigid motion, laying groundwork for improved intraoperative visualization, analysis, and medical simulation (Chen et al., 2024).

1. Rationale and Architectural Principles

EndoGaussians replaces the implicit NeRF-based volumetric representation with a sparse, explicit set of 3D Gaussian splats. Each Gaussian encodes a local “particle” of tissue structure, transparently distinguishing observed, data-driven anatomy from hallucinated or uncertain regions. In contrast to NeRF-based methods such as EndoNeRF, EndoSurf, and ForPlane, the framework directly optimizes these splats from a stream of single-view RGBD frames, efficiently enforcing physical priors on depth and deformation. Key architectural attributes include:

Real-time optimization and rendering via Gaussian splatting, reducing convergence times to minutes per video.
Robustness to soft-tissue motion using learned, per-Gaussian dynamic deformation models.
State-of-the-art reconstruction metrics (PSNR, SSIM, LPIPS) on challenging endoscopic datasets.
Explicit hallucination masking, demarcating uncertain or occluded anatomical regions.

2. Mathematical Model of 3D Gaussian Splatting

The fundamental unit of EndoGaussians is a 3D Gaussian splat parameterized by mean $\mu_i \in \mathbb{R}^3$ , covariance $\Sigma_i\in\mathbb{R}^{3\times 3}$ (positive semidefinite via $\Sigma_i=R_iS_iS_i^TR_i^T$ ), opacity logit $o_i$ (rendered through $\alpha_i = \sigma(o_i)$ ), and appearance features encoded using spherical harmonic coefficients $\{c_{i,k}\}_{k=1}^n$ .

The global, continuous density field is

$\rho(\mathbf{x}) = \sum_{i} w_i\exp\left(-\frac{1}{2}(\mathbf{x}-\mu_i)^\top\Sigma_i^{-1}(\mathbf{x}-\mu_i)\right),$

where weights $w_i$ are tied to opacity. Color and density along a ray $\mathbf{r}(t) = \mathbf{o} + t\mathbf{d}$ are rendered via volumetric integration:

$C(\mathbf{r}) = \int_{t_n}^{t_f} T(t)\,\sigma\left(\mathbf{r}(t)\right)c\left(\mathbf{r}(t)\right)dt,$

with transmittance $T(t)=\exp\left(-\int_{t_n}^{t} \sigma(\mathbf{r}(s))ds\right)$ . Practically, this is implemented with ordered, discrete “alpha-splat” compositing:

$C = \sum_{i=1}^N c_i\ \alpha_i \prod_{j=1}^{i-1}(1-\alpha_j),\hspace{2em} d = \sum_{i=1}^N d_i\ \alpha_i \prod_{j=1}^{i-1}(1-\alpha_j),$

where $d_i$ is the per-Gaussian depth contribution.

3. Spatiotemporal Deformation and Regularization

To model tissue motion, each Gaussian is equipped with a time-dependent warping function:

$W_i(\mathbf{x},t) = R_{i,t}(\mathbf{x} - \mu_{i,0}) + \mu_{i,t},$

where $\mu_{i,0}$ is its canonical position and $(R_{i,t},\,\mu_{i,t})$ are the learned rotation and translation at time $t$ . All deformation, shape, appearance, and opacity variables are optimized jointly.

Physical plausibility is enforced with several losses:

Rigid-pair loss: Maintains relative positions of neighbors across frames

$\mathcal{L}_{i,j}^{\rm rigid} = w_{i,j}\left\|( \mu_{j,t-1}-\mu_{i,t-1} ) - R_{i,t-1}R_{i,t}^{-1}( \mu_{j,t}-\mu_{i,t} )\right\|_2$

Rotational smoothness: Encourages consistent rotations among neighbors

$\mathcal{L}^{\rm rot} = \frac{1}{k|\mathcal{S}|}\sum_{i\in\mathcal{S}}\sum_{j\in\mathrm{knn}_k(i)}w_{i,j}\|\hat q_{j,t}\hat q_{j,t-1}^{-1} - \hat q_{i,t}\hat q_{i,t-1}^{-1}\|_2$

Isometric regularization: Preserves inter-Gaussian distances

$\mathcal{L}^{\rm iso} = \frac{1}{k|\mathcal{S}|}\sum_{i\in\mathcal{S}}\sum_{j\in\mathrm{knn}_k(i)} w_{i,j} \Big|\|\mu_{j,0}-\mu_{i,0}\|_2 - \|\mu_{j,t}-\mu_{i,t}\|_2\Big|$

Supervision incorporates photometric and depth L1 terms and an optional Huber-style depth smoothness prior. The aggregate objective at frame $t>1$ is

$\mathcal{L} = \lambda_1 L_{\rm phot} + \lambda_2 L_{\rm depth} + \lambda_3\sum_{i,j}\mathcal{L}_{i,j}^{\rm rigid} + \lambda_4 \mathcal{L}^{\rm rot} + \lambda_5 \mathcal{L}^{\rm iso}$

4. Computational Pipeline

The EndoGaussians pipeline comprises four principal phases:

Video Inpainting: Tool and occlusion removal with a Flow-Guided Transformer (FGT), yielding clean RGBD images and soft-tissue masks.
Point-Cloud Initialization: Dense 3D points are projected from each (x, y, D(x,y)) tuple as

$X = \frac{(x-c_x)\,D(x,y)\,M(x,y)}{f_x},\ Y = \frac{(y-c_y)\,D(x,y)\,M(x,y)}{f_y},\ Z = D(x,y)\,M(x,y)$

A Gaussian is seeded per point with small initial covariance.

Camera Calibration: Intrinsics $(f_x, f_y, c_x, c_y)$ are known; extrinsics estimated from stereo or SLAM.
Joint Training: All Gaussian and deformation variables are optimized using Adam, supervised by photometric, depth, deformation, and hallucination-mask losses. Training typically converges in 20–30 minutes on a 100–200 frame sequence.

5. Quantitative and Qualitative Evaluation

Empirical comparisons on EndoNeRF and SCARED datasets demonstrate substantial performance improvements over prior approaches. A summary of principal measurements for a single scene is provided below:

Metric	ForPlane [MICCAI ’23]	EndoGaussians
PSNR	36.457	37.654
SSIM	0.946	0.965
LPIPS	0.058	0.036
Render time/frame (s)	~1.7	~0.04

Compared to EndoNeRF and EndoSurf, EndoGaussians improves PSNR by 1–2 dB and SSIM by 1–3 points, while enabling real-time rendering at 25 fps. Reconstructed RGB and depth frames exhibit sharper anatomical boundaries and reduced hallucination, particularly in vessel and sulcal regions. Smoothness and stability during rapid deformation are attributed to rigid and rotational losses. Ablations reveal that omitting depth loss introduces drift, suppressing deformation regularization leads to Gaussian collapse, and removing hallucination loss causes tool-occluded regions to be spuriously reconstructed.

6. Interpretability, Limitations, and Future Directions

EndoGaussians enables explicit, interpretable segmentation of observed versus hallucinatory content, directly mapping splat assignments onto the 3D geometry. This clarifies uncertainty and supports more reliable intraoperative analytics. However, the model currently requires precomputed masks and depth, and moderate GPU memory. Extreme tissue topological changes may exceed the representational power of the per-Gaussian deformation model.

Potential clinical applications include real-time 3D display for VR surgical navigation, quantitative tissue motion tracking in minimally invasive surgery, and synthetic data generation for robotic surgery training (Chen et al., 2024). A plausible implication is that future research may focus on extending the deformation model to accommodate more extensive structural changes, as well as integrating end-to-end learning with on-the-fly inpainting and depth inference.

Markdown Report Issue Upgrade to Chat

References (1)

EndoGaussians: Single View Dynamic Gaussian Splatting for Deformable Endoscopic Tissues Reconstruction (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EndoGaussians.

EndoGaussians: 3D Gaussian Splatting for Endoscopy

1. Rationale and Architectural Principles

2. Mathematical Model of 3D Gaussian Splatting

3. Spatiotemporal Deformation and Regularization

4. Computational Pipeline

5. Quantitative and Qualitative Evaluation

6. Interpretability, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

EndoGaussians: 3D Gaussian Splatting for Endoscopy

1. Rationale and Architectural Principles

2. Mathematical Model of 3D Gaussian Splatting

3. Spatiotemporal Deformation and Regularization

4. Computational Pipeline

5. Quantitative and Qualitative Evaluation

6. Interpretability, Limitations, and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research