Deformable Tissue Reconstruction
- Deformable tissue reconstruction is a technique for recovering temporally varying 3D soft tissue geometry and appearance from image data under non-rigid motion.
- Modern methods leverage dynamic neural radiance fields, 3D Gaussian splatting, and mesh-based encodings to balance high-fidelity rendering with real-time performance.
- These approaches enhance intraoperative guidance and biomechanical analysis while addressing challenges like occlusion, artifact mitigation, and topological changes.
Deformable tissue reconstruction refers to the recovery of temporally varying three-dimensional geometry and appearance of soft biological tissues from image or video data, typically under non-rigid motion, partial occlusion, and challenging intraoperative conditions. This capability is central to robotic surgery, image-guided intervention, and a range of advanced surgical navigation and decision-support workflows. Modern methods, driven by innovations in neural scene representations and real-time computer graphics, achieve high-fidelity, physically plausible reconstructions from monocular and stereo endoscopic video. The field is defined by the need to balance geometric fidelity, temporal coherence, resistance to visual and physical artifacts (e.g., aliasing, occlusion, topology changes), and computational speed suitable for intraoperative use.
1. Foundations and Motivation
Deformable tissue reconstruction addresses the challenge of modeling soft tissue undergoing non-rigid deformations, including elastic stretching, instrument interaction, and surgical manipulation. Traditional multi-view stereo (MVS) and SLAM-based approaches (Song et al., 2020) are effective only for rigid or near-rigid environments; they fail to account for large, localized deformations and topological changes such as tissue cutting or shearing. Recent advances leverage dynamic neural radiance fields (NeRF), explicit 3D Gaussian splatting, and hybrid mesh-based strategies to explicitly encode and recover spatio-temporal field properties of tissue.
These reconstructions are critical for tasks such as:
- Intraoperative guidance and augmented reality overlays.
- Force estimation and closed-loop robotic control.
- Virtual training environments requiring temporally accurate physiologic models.
- Quantitative biomechanical analysis of tissue strain and interaction.
2. Scene Representations: Implicit, Explicit, and Hybrid Models
Neural Fields and Plane Factorizations
Dynamic NeRF-based frameworks (Wang et al., 2022, Yang et al., 2023, Yang et al., 2023) represent the scene as a canonical volume augmented with a learned, time-dependent deformation field: where is volume density and is radiance. Factorizations such as static and dynamic orthogonal neural planes (Yang et al., 2023, Yang et al., 2023) discretize the space into a compact set of 2D feature planes (e.g., for static fields, for dynamic fields). Features from these planes are fused via bilinear/trilinear interpolation, dramatically reducing memory and acceleration requirements at negligible cost to fidelity.
3D Gaussian Splatting
Explicit methods model tissue as a set of anisotropic 3D Gaussians [Gi]: parameterized by mean , covariance , and associated color/opacity. Rendering proceeds by projecting each Gaussian to an image-plane ellipse and compositing visibilities and colors via analytic 0-blending (Xie et al., 2024, Chen et al., 2024, Shan et al., 2 Jan 2025, Yang et al., 2024, Zhu et al., 2024). Temporal deformation fields modulate the Gaussians' position, scale, and orientation—either globally (via MLP), per-Gaussian (via basis expansions), or hierarchically (by region). This enables fast, parallelizable scene updates and real-time rendering rates (often >300 FPS on RTX-class GPUs (Yang et al., 2024)).
Mesh- and Graph-based Encodings
Mesh-based approaches parameterize the deformable surface directly as a graph, with each vertex subject to learned or physics-guided displacements (Nakao et al., 2021, Chen et al., 24 Jun 2025, Liu et al., 2020). Position-based dynamics (PBD) (Liu et al., 2020) and canonical map formulations (Chen et al., 24 Jun 2025) constrain the reconstructed surface to remain locally consistent with biomechanics and observed vision cues. Deformation-aware graph attention (DeGAT) (Fan et al., 25 Mar 2026) propagates non-local geometric context for improved geometric and topological coherence, particularly across occlusions.
3. Deformation Modeling and Spatio-Temporal Dynamics
Non-rigid tissue motion is captured via several mechanisms:
- Global MLP-based fields: Map 1 to 3D offsets via learned multilayer perceptrons (Wang et al., 2022, Xie et al., 2024).
- Per-Gaussian and basis function expansions: Each primitive evolves by projecting time through a learned sum of Gaussian kernels, decoupling local and global motions and supporting irreversible changes such as splitting or shearing (Yang et al., 2024, Shan et al., 2 Jan 2025).
- Life cycle models: Explicit time-varying opacity fields enable Gaussians to “appear” and “disappear,” capturing topological changes (Shan et al., 2 Jan 2025).
- Attention-driven dynamic decoders: Self-attention modules coupled with local MLPs adaptively weight deformation predictions globally and locally per attribute at each time (Huang et al., 31 Oct 2025).
- Vision-tracked deformation guidance: Integration of explicit tracking (e.g., CoTracker-based 2D keypoint tracking) with implicit deformation networks enables precise, temporally coherent lifting of observed motion into the 3D scene (Wang et al., 4 Mar 2025).
Regularization is fundamental. Neighbor-based deformation penalties (e.g., ensuring local pairwise distances and covariances are preserved) (Xie et al., 2024), ARAP (as-rigid-as-possible) terms (Song et al., 2020, Chen et al., 24 Feb 2026), and multi-level rotation and isometry constraints (Chen et al., 24 Feb 2026) are widely deployed. Temporal losses (e.g., smoothness in parameters or learned fields) enforce dynamic coherence.
4. Training Objectives, Supervision, and Artifacts Mitigation
Losses and Supervision
Primary objectives include:
- Photometric loss: 2 for rendered vs. observed image colors.
- Depth supervision: 3 with stereo- or monocular-derived depths.
- Surface-aligned and SDF losses: Enforce that the reconstructed Gaussian- or SDF-derived surface matches stereo or mesh estimates (Zhu et al., 2024, Chen et al., 24 Feb 2026).
- Spatial and temporal regularization: TV, neighbor deformation, ARAP, and smoothness losses.
Data curation includes surgical tool masks to exclude instrument-occluded pixels from loss computation or to guide ray sampling (Yang et al., 2023, Wang et al., 2022, Zhu et al., 2024, Xie et al., 2024). Spatio-temporal importance sampling focuses computation on regions of large deformation or frequent occlusion (Yang et al., 2023, Yang et al., 2023).
Anti-Aliasing and Rendering Artifacts
Alias-free, temporally stable rendering is achieved via joint volumetric and screen-space anti-aliasing: 3D Gaussian smoothing (convolution with low-pass kernels on each Gaussian) and 2D mipmap-style filtering after projection (Huang et al., 31 Oct 2025). These strategies reduce ringing, stair-stepping, Moiré patterns, and specular flicker, which otherwise limit clinical value.
5. Algorithmic Advances for Real-time and High-Fidelity Reconstruction
A hallmark of recent research is overcoming the historical dichotomy between fidelity and intraoperative speed:
- Orthogonal neural planes (Forplane) allow discretization of 4D space, greatly reducing memory and computation (Yang et al., 2023, Yang et al., 2023). Forplane and LerPlane achieve 41005 optimization speedup over dynamic NeRF baselines at equivalent quality.
- Gaussian splatting supports closed-form projection, compositing, and real-time rasterization pipelines that are orders of magnitude faster than MLP-bound NeRF approaches (Xie et al., 2024, Chen et al., 2024, Zhu et al., 2024, Shan et al., 2 Jan 2025, Yang et al., 2024). Deform3DGS achieves 338 FPS with 1-minute per-scene training (Yang et al., 2024); EH-SurGS combines life-cycle modeling and adaptive motion hierarchy for 380 FPS and PSNR 640 dB (Shan et al., 2 Jan 2025).
- Flexible, local deformation models (FDMs) assign decoupled temporal dynamics per Gaussian, further enhancing parallelism and robustness to long-range non-rigid motion (Yang et al., 2024).
- Hierarchical and region-based update strategies minimize computation in static areas, culling unnecessary deformation updates and maximizing inference throughput (Shan et al., 2 Jan 2025).
6. Benchmarks, Evaluation Metrics, and Results
Quantitative evaluation commonly reports PSNR, SSIM, and LPIPS, with metrics computed on held-out frames, tool-masked tissue regions, or in terms of point-cloud Chamfer distance.
| Method | Dataset | PSNR (dB) | SSIM | LPIPS | FPS | Training Time |
|---|---|---|---|---|---|---|
| EndoNeRF | EndoNeRF | 34.2 | 0.94 | 0.16 | <1 | 14 h |
| LerPlane | EndoNeRF | 26.8 | 0.94 | 0.11 | ~1.5 | 10 min |
| SurgicalGaussian | EndoNeRF | 38.8 | 0.97 | 0.049 | 80 | 40K iters |
| Deform3DGS | EndoNeRF | 37.9 | 0.958 | 0.06 | 339 | 64 s |
| EH-SurGS | EndoNeRF | 39.9 | 0.972 | 0.034 | 380 | 105 s |
| EndoGS | EndoNeRF | 37.9 | 0.966 | 0.034 | 70 | 60K iters |
SAGS (Huang et al., 31 Oct 2025) further improves fidelity (EndoNeRF/Binocular: PSNR=39.16, SSIM=0.970, LPIPS=0.025). Ablation studies confirm the efficacy of dynamic anti-aliasing, attention-driven decoders, and life-cycle models in maximizing both metric fidelity and qualitative appearance (e.g., texture sharpness, deformation continuity).
Physics-based and canonical map methods report surface-distance errors of 0.37±0.27 mm in non-occluded and 0.39±0.21 mm in occluded regions (Chen et al., 24 Jun 2025). These results rival or surpass prior offline reconstructions and are robust to camera motion and occlusion.
7. Current Limitations and Future Directions
Key technical challenges and ongoing research directions include:
- Occlusion and topology change: Permanently unobserved regions rely on inpainting and TV regularization, which may over-smooth or hallucinate. Life-cycle-aware models mitigate, but do not fully solve, topology breaks due to cutting or severe occlusion (Shan et al., 2 Jan 2025, Zhu et al., 2024).
- Real-time learning-based deformation: Efficient, physically plausible modeling of highly non-linear deformations in real time remains challenging. Hybrid methods combining physics-informed priors, learning from video, and explicit region-based updates are a focus (Chen et al., 24 Jun 2025, Liu et al., 2020, Chen et al., 24 Feb 2026).
- Integration with downstream clinical tasks: The use of reconstruction for force sensing, preoperative image registration, and AR overlays requires mesh extraction and hard real-time guarantees (Yang et al., 2023, Wang et al., 2022, Fan et al., 25 Mar 2026).
- Generalization and Transfer: Cross-dataset zero-shot performance and adaptation to varied organs and modalities (e.g., ultrasound) are open problems (Fan et al., 25 Mar 2026, Xie et al., 2024).
- Handling long sequences: Memory and computational requirements scale with sequence length; hierarchical and keyframe-based approaches are under active investigation (Yang et al., 2024).
There is a sustained trend toward unified frameworks that balance explicit geometry, temporally flexible motion, efficient neural encoding, and robust supervision from multi-modal data. The field is converging on solutions that enable high-resolution, temporally coherent, and clinically actionable reconstructions within intraoperative time budgets.