Papers
Topics
Authors
Recent
Search
2000 character limit reached

Deformable Tissue Reconstruction

Updated 23 June 2026
  • Deformable tissue reconstruction is a technique for recovering temporally varying 3D soft tissue geometry and appearance from image data under non-rigid motion.
  • Modern methods leverage dynamic neural radiance fields, 3D Gaussian splatting, and mesh-based encodings to balance high-fidelity rendering with real-time performance.
  • These approaches enhance intraoperative guidance and biomechanical analysis while addressing challenges like occlusion, artifact mitigation, and topological changes.

Deformable tissue reconstruction refers to the recovery of temporally varying three-dimensional geometry and appearance of soft biological tissues from image or video data, typically under non-rigid motion, partial occlusion, and challenging intraoperative conditions. This capability is central to robotic surgery, image-guided intervention, and a range of advanced surgical navigation and decision-support workflows. Modern methods, driven by innovations in neural scene representations and real-time computer graphics, achieve high-fidelity, physically plausible reconstructions from monocular and stereo endoscopic video. The field is defined by the need to balance geometric fidelity, temporal coherence, resistance to visual and physical artifacts (e.g., aliasing, occlusion, topology changes), and computational speed suitable for intraoperative use.

1. Foundations and Motivation

Deformable tissue reconstruction addresses the challenge of modeling soft tissue undergoing non-rigid deformations, including elastic stretching, instrument interaction, and surgical manipulation. Traditional multi-view stereo (MVS) and SLAM-based approaches (Song et al., 2020) are effective only for rigid or near-rigid environments; they fail to account for large, localized deformations and topological changes such as tissue cutting or shearing. Recent advances leverage dynamic neural radiance fields (NeRF), explicit 3D Gaussian splatting, and hybrid mesh-based strategies to explicitly encode and recover spatio-temporal field properties of tissue.

These reconstructions are critical for tasks such as:

  • Intraoperative guidance and augmented reality overlays.
  • Force estimation and closed-loop robotic control.
  • Virtual training environments requiring temporally accurate physiologic models.
  • Quantitative biomechanical analysis of tissue strain and interaction.

2. Scene Representations: Implicit, Explicit, and Hybrid Models

Neural Fields and Plane Factorizations

Dynamic NeRF-based frameworks (Wang et al., 2022, Yang et al., 2023, Yang et al., 2023) represent the scene as a canonical volume augmented with a learned, time-dependent deformation field: f:R3×R(σ,c),f : \mathbb{R}^3 \times \mathbb{R} \to (\sigma, \mathbf{c}), where σ\sigma is volume density and c\mathbf{c} is radiance. Factorizations such as static and dynamic orthogonal neural planes (Yang et al., 2023, Yang et al., 2023) discretize the 4D4\text{D} space (x,y,z,t)(x, y, z, t) into a compact set of 2D feature planes (e.g., FXY,FXZ,FYZF_{XY}, F_{XZ}, F_{YZ} for static fields, FXT,FYT,FZTF_{XT}, F_{YT}, F_{ZT} for dynamic fields). Features from these planes are fused via bilinear/trilinear interpolation, dramatically reducing memory and acceleration requirements at negligible cost to fidelity.

3D Gaussian Splatting

Explicit methods model tissue as a set of anisotropic 3D Gaussians [Gi]: Gi(x)=exp(12(xμi)Σi1(xμi)),G_i(\mathbf{x}) = \exp\left(-\frac{1}{2}(\mathbf{x} - \mu_i)^\top \Sigma_i^{-1} (\mathbf{x} - \mu_i)\right), parameterized by mean μi\mu_i, covariance Σi\Sigma_i, and associated color/opacity. Rendering proceeds by projecting each Gaussian to an image-plane ellipse and compositing visibilities and colors via analytic σ\sigma0-blending (Xie et al., 2024, Chen et al., 2024, Shan et al., 2 Jan 2025, Yang et al., 2024, Zhu et al., 2024). Temporal deformation fields modulate the Gaussians' position, scale, and orientation—either globally (via MLP), per-Gaussian (via basis expansions), or hierarchically (by region). This enables fast, parallelizable scene updates and real-time rendering rates (often >300 FPS on RTX-class GPUs (Yang et al., 2024)).

Mesh- and Graph-based Encodings

Mesh-based approaches parameterize the deformable surface directly as a graph, with each vertex subject to learned or physics-guided displacements (Nakao et al., 2021, Chen et al., 24 Jun 2025, Liu et al., 2020). Position-based dynamics (PBD) (Liu et al., 2020) and canonical map formulations (Chen et al., 24 Jun 2025) constrain the reconstructed surface to remain locally consistent with biomechanics and observed vision cues. Deformation-aware graph attention (DeGAT) (Fan et al., 25 Mar 2026) propagates non-local geometric context for improved geometric and topological coherence, particularly across occlusions.

3. Deformation Modeling and Spatio-Temporal Dynamics

Non-rigid tissue motion is captured via several mechanisms:

  • Global MLP-based fields: Map σ\sigma1 to 3D offsets via learned multilayer perceptrons (Wang et al., 2022, Xie et al., 2024).
  • Per-Gaussian and basis function expansions: Each primitive evolves by projecting time through a learned sum of Gaussian kernels, decoupling local and global motions and supporting irreversible changes such as splitting or shearing (Yang et al., 2024, Shan et al., 2 Jan 2025).
  • Life cycle models: Explicit time-varying opacity fields enable Gaussians to “appear” and “disappear,” capturing topological changes (Shan et al., 2 Jan 2025).
  • Attention-driven dynamic decoders: Self-attention modules coupled with local MLPs adaptively weight deformation predictions globally and locally per attribute at each time (Huang et al., 31 Oct 2025).
  • Vision-tracked deformation guidance: Integration of explicit tracking (e.g., CoTracker-based 2D keypoint tracking) with implicit deformation networks enables precise, temporally coherent lifting of observed motion into the 3D scene (Wang et al., 4 Mar 2025).

Regularization is fundamental. Neighbor-based deformation penalties (e.g., ensuring local pairwise distances and covariances are preserved) (Xie et al., 2024), ARAP (as-rigid-as-possible) terms (Song et al., 2020, Chen et al., 24 Feb 2026), and multi-level rotation and isometry constraints (Chen et al., 24 Feb 2026) are widely deployed. Temporal losses (e.g., smoothness in parameters or learned fields) enforce dynamic coherence.

4. Training Objectives, Supervision, and Artifacts Mitigation

Losses and Supervision

Primary objectives include:

  • Photometric loss: σ\sigma2 for rendered vs. observed image colors.
  • Depth supervision: σ\sigma3 with stereo- or monocular-derived depths.
  • Surface-aligned and SDF losses: Enforce that the reconstructed Gaussian- or SDF-derived surface matches stereo or mesh estimates (Zhu et al., 2024, Chen et al., 24 Feb 2026).
  • Spatial and temporal regularization: TV, neighbor deformation, ARAP, and smoothness losses.

Data curation includes surgical tool masks to exclude instrument-occluded pixels from loss computation or to guide ray sampling (Yang et al., 2023, Wang et al., 2022, Zhu et al., 2024, Xie et al., 2024). Spatio-temporal importance sampling focuses computation on regions of large deformation or frequent occlusion (Yang et al., 2023, Yang et al., 2023).

Anti-Aliasing and Rendering Artifacts

Alias-free, temporally stable rendering is achieved via joint volumetric and screen-space anti-aliasing: 3D Gaussian smoothing (convolution with low-pass kernels on each Gaussian) and 2D mipmap-style filtering after projection (Huang et al., 31 Oct 2025). These strategies reduce ringing, stair-stepping, Moiré patterns, and specular flicker, which otherwise limit clinical value.

5. Algorithmic Advances for Real-time and High-Fidelity Reconstruction

A hallmark of recent research is overcoming the historical dichotomy between fidelity and intraoperative speed:

6. Benchmarks, Evaluation Metrics, and Results

Quantitative evaluation commonly reports PSNR, SSIM, and LPIPS, with metrics computed on held-out frames, tool-masked tissue regions, or in terms of point-cloud Chamfer distance.

Method Dataset PSNR (dB) SSIM LPIPS FPS Training Time
EndoNeRF EndoNeRF 34.2 0.94 0.16 <1 14 h
LerPlane EndoNeRF 26.8 0.94 0.11 ~1.5 10 min
SurgicalGaussian EndoNeRF 38.8 0.97 0.049 80 40K iters
Deform3DGS EndoNeRF 37.9 0.958 0.06 339 64 s
EH-SurGS EndoNeRF 39.9 0.972 0.034 380 105 s
EndoGS EndoNeRF 37.9 0.966 0.034 70 60K iters

SAGS (Huang et al., 31 Oct 2025) further improves fidelity (EndoNeRF/Binocular: PSNR=39.16, SSIM=0.970, LPIPS=0.025). Ablation studies confirm the efficacy of dynamic anti-aliasing, attention-driven decoders, and life-cycle models in maximizing both metric fidelity and qualitative appearance (e.g., texture sharpness, deformation continuity).

Physics-based and canonical map methods report surface-distance errors of 0.37±0.27 mm in non-occluded and 0.39±0.21 mm in occluded regions (Chen et al., 24 Jun 2025). These results rival or surpass prior offline reconstructions and are robust to camera motion and occlusion.

7. Current Limitations and Future Directions

Key technical challenges and ongoing research directions include:

There is a sustained trend toward unified frameworks that balance explicit geometry, temporally flexible motion, efficient neural encoding, and robust supervision from multi-modal data. The field is converging on solutions that enable high-resolution, temporally coherent, and clinically actionable reconstructions within intraoperative time budgets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deformable Tissue Reconstruction.