Degradation-Aware Endoscopic Video Enhancement
- The paper introduces a degradation-aware framework that models and quantifies image artifacts to enable robust endoscopic video enhancement.
- It details two paradigms—explicit artifact detection with conditional GAN restoration and implicit feature fusion via DGGAN for real-time performance.
- Quantitative evaluations show significant improvements in PSNR, SSIM, and frame preservation, demonstrating both practical and clinical implications.
Degradation-aware frameworks for endoscopic video enhancement are dedicated methodologies that leverage explicit modeling or estimation of image degradation to enable robust, real-time restoration of endoscopic video frames afflicted by a wide spectrum of artifacts. These frameworks target both computational efficiency and enhancement fidelity under multimodal degradations such as uneven illumination, tissue scattering, occlusions, motion blur, and other artifacts prevalent in intraoperative imaging environments.
1. Framework Architectures
Degradation-aware approaches for endoscopic video enhancement can be categorized into two main paradigms: explicit artifact detection and sequential restoration (Ali et al., 2019), and implicit degradation modeling with feature fusion for learned enhancement (Xu et al., 8 Dec 2025).
Explicit Artifact Detection Pipelines
Such frameworks are structured as cascaded systems composed of three stages:
- Multi-class artifact detection: Employing fast multi-scale, single-stage convolutional neural networks (YOLOv3-spp with Darknet-53 backbone), they localize and classify artifacts in each input frame—labels include blur, bubbles, specularity, saturation, contrast, and miscellaneous classes.
- Degradation-aware frame scoring: Each artifact instance is weighted according to class, region, and image occupancy to compute a scalar quality score via
where , , and are class, area, and location weights, and . Frames with are dropped, are sent to restoration, and are passed unaltered.
- Artifact-specific restoration: Surviving frames are processed through a fixed sequence of conditional GAN-based restorers: blind deblurring, saturation/contrast correction, and inpainting.
Implicit Degradation Modeling (DGGAN)
DGGAN (Degradation Guided GAN) introduces a unified, cycle-consistent GAN architecture featuring:
- Degradation-Aware Module (DAM): An encoder that extracts low-dimensional degradation codes via contrastive pre-training. This DAM is pretrained and frozen during subsequent adversarial training.
- Degradation-Guided Enhancement Module (DGEM): Receives the input low-quality frame and the code from DAM, performing restoration using channel and spatial attention, shallow convolutions, and Swin Transformer blocks modulated by degradation.
- Degradation Representation Propagation Module (DRPM): A lightweight transformer propagates the key-frame degradation codes to subsequent non-key frames, greatly reducing inference cost and enabling real-time operation.
Key frames (every frames) are passed through DAM, while DRPM propagates degradation codes for the innermost frames, which are then all passed through DGEM for enhancement.
2. Degradation Representation and Fusion
Contrastive Degradation Coding
In DGGAN, DAM learns a compact representation of frame-specific degradation by minimizing an InfoNCE-style contrastive loss:
ensuring features from augmented versions of the same image map to similar codes, while features from different images diverge.
Feature Modulation by Degradation
DGEM compresses the degradation code using channel and spatial attention:
where denotes elementwise multiplication.
In each Swin Transformer block, the compact code modulates the value vectors in the attention calculation:
This selective feature modulation maintains content-dependent attention while adapting feature representation to estimated degradations.
3. Restoration Strategies
Artifact-Specific GANs
In explicit artifact pipelines (Ali et al., 2019), restoration employs:
- U-Net based blind deblurring conditional GANs (CGANs), optimized with WGAN-GP adversarial loss, pixelwise , and high-frequency consistency.
- Saturation/contrast restorers (CGAN with , patch discriminators) and post-processing color re-transfer (CRT) to mitigate GAN-induced color shifts via
using mean/covariance statistics on non-saturated pixels.
- Inpainting CGANs with both global and local discriminators, specialized for specularity, bubble, and miscellaneous artifact regions. The loss combines adversarial and context masking terms.
Restoration is routed sequentially to prevent compounding errors (e.g., inpainting should follow deblurring).
Cycle-Consistent Degradation Modeling
DGGAN formalizes enhancement as a bi-directional mapping:
where is a physics-informed generator predicting degradation parameters (e.g., blur kernel, smoke map) and simulating degradation via explicit physical models. Cycle-consistency loss is enforced:
to tie both image content and code representations across restoration and re-degradation.
4. Training and Computational Properties
Loss Composition
DGGAN’s generator objective comprises adversarial, cycle-consistency, and contrastive terms:
with tailored discriminators:
- : PatchGAN operating on high-quality images
- : PatchGAN on synthetic low-quality images
- : PatchGAN on high-pass filtered images to sharpen restoration of anatomical detail
Optimization and Efficiency
- Adam optimizer ()
- Learning rates: (DAM, DGEM, DRPM); discriminators at
- Model sizes: DGGAN-DAM ≈4.72M params/54.4 GFLOPs; DGGAN-DRPM ≈0.63M/32.4 GFLOPs.
- On patches: DGGAN-DRPM achieves 0.03 s/frame (33 FPS), satisfying real-time constraints for intraoperative deployment (Xu et al., 8 Dec 2025).
Ablation and Trade-offs
Omitting DAM contrastive pre-training decreases PSNR from 33.21 dB to 27.21 dB and worsens NIQE from 3.62 to 6.19, demonstrating the necessity of a dedicated degradation encoder. Varying the propagation interval trades off speed and fidelity (e.g., yields 31.03 dB PSNR at 0.037 s/frame). Removing image-space cycle terms collapses performance (SSIM 0.5).
5. Quantitative and Qualitative Evaluation
Artifact Detection and Frame Scoring
Artifact detectors achieve mAP (IoU0.05) of 49.0 and mean computational time of 88 ms/frame on a GTX Titan Black (Ali et al., 2019).
The quality scoring approach enables selective restoration, improving data yield: naive binary keep/discard retains only 43.7% of frames, while the full framework preserves 68.7%.
Restoration Accuracy
PSNR and SSIM on SCARED (DGGAN vs. state-of-the-art):
| Degradation | DGGAN-DAM | DGGAN-DRPM | Restormer | SwinIR (comparable FLOP) |
|---|---|---|---|---|
| Random Noise | 33.21/0.9057 | 31.03/0.8678 | 31.52/0.8710 | [metrics not specified] |
| Motion blur | 26.86/0.9210 | – | 23.01/0.8127 | – |
| Low-light | 28.01/0.8619 | – | 28.83/0.8657 | – |
| Smoke | 28.18/0.8553 | – | 27.37/0.8346 | – |
For GAN-based explicit artifact restorers, CGAN achieves , on blind deblurring with simulated motion, outperforming SRN-DeblurNet (24.5/$0.995$) and TV-deconvolution (23.5/$0.966$).
End-to-end, the explicit artifact pipeline recovers 25% more frames for downstream analysis and receives clinical ratings of $7.9$ (blur), $7.7$ (specularity/misc.), and $1.5$ (saturation correction) out of $10$ (Ali et al., 2019).
Real-World Video Performance
NIQE/PIQE metrics on SES real endoscopic videos improve with DGGAN-DAM (NIQE 3.62, PIQE 13.79) compared to Restormer (4.32/15.79). Qualitatively, outputs exhibit sharper edges and fewer artifacts under adverse conditions (uneven illumination, tissue scattering, smoke, motion blur) (Xu et al., 8 Dec 2025).
6. Significance and Future Directions
Degradation-aware frameworks that integrate explicit degradation modeling—via either contrastive feature propagation (Xu et al., 8 Dec 2025) or explicit artifact detection and GAN-based correction (Ali et al., 2019)—substantially improve the retention and enhancement of diagnostically and operatively valuable frames in endoscopic video. Such methods achieve real-time or near-real-time throughput, state-of-the-art gains in PSNR/SSIM, and meaningful restoration under severe multimodal artifact patterns.
Remaining challenges include the development of even lighter-weight transformers for deployment on surgical hardware, improved robustness to rare or previously unseen degradations, and further integration of physically motivated models into the generative restoration process. As implicit degradation learning and explicit pipeline-based strategies both demonstrate strong empirical advantages, ongoing comparisons and hybridizations are likely to shape the next generation of endoscopic video enhancement systems.