Degradation-Aware Endoscopic Video Enhancement

Updated 15 December 2025

The paper introduces a degradation-aware framework that models and quantifies image artifacts to enable robust endoscopic video enhancement.
It details two paradigms—explicit artifact detection with conditional GAN restoration and implicit feature fusion via DGGAN for real-time performance.
Quantitative evaluations show significant improvements in PSNR, SSIM, and frame preservation, demonstrating both practical and clinical implications.

Degradation-aware frameworks for endoscopic video enhancement are dedicated methodologies that leverage explicit modeling or estimation of image degradation to enable robust, real-time restoration of endoscopic video frames afflicted by a wide spectrum of artifacts. These frameworks target both computational efficiency and enhancement fidelity under multimodal degradations such as uneven illumination, tissue scattering, occlusions, motion blur, and other artifacts prevalent in intraoperative imaging environments.

1. Framework Architectures

Degradation-aware approaches for endoscopic video enhancement can be categorized into two main paradigms: explicit artifact detection and sequential restoration (Ali et al., 2019), and implicit degradation modeling with feature fusion for learned enhancement (Xu et al., 8 Dec 2025).

Explicit Artifact Detection Pipelines

Such frameworks are structured as cascaded systems composed of three stages:

Multi-class artifact detection: Employing fast multi-scale, single-stage convolutional neural networks (YOLOv3-spp with Darknet-53 backbone), they localize and classify artifacts in each input frame—labels include blur, bubbles, specularity, saturation, contrast, and miscellaneous classes.
Degradation-aware frame scoring: Each artifact instance is weighted according to class, region, and image occupancy to compute a scalar quality score via

$QS = \Bigl\lfloor\,1 - \sum_{b\in\mathcal B}\bigl(\lambda_A\,W_C(b)\,W_A(b)\;+\;\lambda_L\,W_C(b)\,W_L(b)\bigr)\Bigr\rfloor_{0}^1,$

where $W_C$ , $W_A$ , and $W_L$ are class, area, and location weights, and $\lambda_A=\lambda_L=0.5$ . Frames with $QS<0.5$ are dropped, $0.5 \le QS \le 0.95$ are sent to restoration, and $QS>0.95$ are passed unaltered.

Artifact-specific restoration: Surviving frames are processed through a fixed sequence of conditional GAN-based restorers: blind deblurring, saturation/contrast correction, and inpainting.

Implicit Degradation Modeling (DGGAN)

DGGAN (Degradation Guided GAN) introduces a unified, cycle-consistent GAN architecture featuring:

Degradation-Aware Module (DAM): An encoder that extracts low-dimensional degradation codes via contrastive pre-training. This DAM is pretrained and frozen during subsequent adversarial training.
Degradation-Guided Enhancement Module (DGEM): Receives the input low-quality frame and the code from DAM, performing restoration using channel and spatial attention, shallow convolutions, and Swin Transformer blocks modulated by degradation.
Degradation Representation Propagation Module (DRPM): A lightweight transformer propagates the key-frame degradation codes to subsequent non-key frames, greatly reducing inference cost and enabling real-time operation.

Key frames (every $T_\Delta$ frames) are passed through DAM, while DRPM propagates degradation codes for the innermost frames, which are then all passed through DGEM for enhancement.

2. Degradation Representation and Fusion

Contrastive Degradation Coding

In DGGAN, DAM learns a compact representation of frame-specific degradation by minimizing an InfoNCE-style contrastive loss:

$L_{\mathrm{con}}^{(i)} = -\log \frac{\exp\bigl(d_i^q \cdot d_i^k / \tau\bigr)}{\sum_{j=1}^N \exp\bigl(d_i^q \cdot d_j^k / \tau\bigr)},$

ensuring features from augmented versions of the same image map to similar codes, while features from different images diverge.

Feature Modulation by Degradation

DGEM compresses the degradation code using channel and spatial attention:

$d' = \sigma\bigl(W_{c2}\,\delta(W_{c1}\,d)\bigr)\;\odot d, \quad d_c = \sigma\bigl(W_{s1}\,d'\bigr)\;\odot d',$

where $\odot$ denotes elementwise multiplication.

In each Swin Transformer block, the compact code $d_c$ modulates the value vectors in the attention calculation:

$\hat V_h = V_h \odot d_c,\qquad \mathrm{Attention}_h(Q_h,K_h,V_h) = \mathrm{Softmax}\left(\frac{Q_h K_h^T}{\sqrt{d_h}}\right)\hat V_h.$

This selective feature modulation maintains content-dependent attention while adapting feature representation to estimated degradations.

3. Restoration Strategies

Artifact-Specific GANs

In explicit artifact pipelines (Ali et al., 2019), restoration employs:

U-Net based blind deblurring conditional GANs (CGANs), optimized with WGAN-GP adversarial loss, pixelwise $L_2$ , and high-frequency consistency.
Saturation/contrast restorers (CGAN with $L_2$ , patch discriminators) and post-processing color re-transfer (CRT) to mitigate GAN-induced color shifts via

$I_t' = \Sigma_s^{1/2}\,\Sigma_t^{-1/2}\,(I_t-\mu_t)+\mu_s,$

using mean/covariance statistics on non-saturated pixels.

Inpainting CGANs with both global and local discriminators, specialized for specularity, bubble, and miscellaneous artifact regions. The loss combines adversarial and context masking terms.

Restoration is routed sequentially to prevent compounding errors (e.g., inpainting should follow deblurring).

Cycle-Consistent Degradation Modeling

DGGAN formalizes enhancement as a bi-directional mapping:

$G_H:\mathcal L\to\mathcal H,\quad G_L:\mathcal H\to\mathcal L,$

where $G_L$ is a physics-informed generator predicting degradation parameters (e.g., blur kernel, smoke map) and simulating degradation via explicit physical models. Cycle-consistency loss is enforced:

$L_{\mathrm{cyc}} = \mathbb{E}_{x_l}\|G_L(G_H(x_l)) - x_l\|_1 +\mathbb{E}_{x_h}\|G_H(G_L(x_h)) - x_h\|_1 +\mathbb{E}_{x_h}\|D(G_L(x_h))-D(x_l)\|_1$

to tie both image content and code representations across restoration and re-degradation.

4. Training and Computational Properties

Loss Composition

DGGAN’s generator objective comprises adversarial, cycle-consistency, and contrastive terms:

$L_G = L_{\mathrm{adv}} + \lambda_{\mathrm{cyc}}\,L_{\mathrm{cyc}} + \lambda_{\mathrm{con}}\,L_{\mathrm{con}},$

with tailored discriminators:

$D_H$ : PatchGAN operating on high-quality images
$D_L$ : PatchGAN on synthetic low-quality images
$D_{hf}$ : PatchGAN on high-pass filtered images to sharpen restoration of anatomical detail

Optimization and Efficiency

Adam optimizer ( $\beta_1=0.9,\,\beta_2=0.999$ )
Learning rates: $5\times10^{-5}$ (DAM, DGEM, DRPM); discriminators at $2\times10^{-4}$
Model sizes: DGGAN-DAM ≈4.72M params/54.4 GFLOPs; DGGAN-DRPM ≈0.63M/32.4 GFLOPs.
On $320\times320$ patches: DGGAN-DRPM achieves $\sim$ 0.03 s/frame ( $\sim$ 33 FPS), satisfying real-time constraints for intraoperative deployment (Xu et al., 8 Dec 2025).

Ablation and Trade-offs

Omitting DAM contrastive pre-training decreases PSNR from 33.21 dB to 27.21 dB and worsens NIQE from 3.62 to 6.19, demonstrating the necessity of a dedicated degradation encoder. Varying the propagation interval $T_\Delta$ trades off speed and fidelity (e.g., $T_\Delta=15$ yields 31.03 dB PSNR at 0.037 s/frame). Removing image-space cycle terms collapses performance (SSIM $\sim$ 0.5).

5. Quantitative and Qualitative Evaluation

Artifact Detection and Frame Scoring

Artifact detectors achieve mAP $_5$ (IoU $\ge$ 0.05) of 49.0 and mean computational time of 88 ms/frame on a GTX Titan Black (Ali et al., 2019).

The quality scoring approach enables selective restoration, improving data yield: naive binary keep/discard retains only 43.7% of frames, while the full framework preserves 68.7%.

Restoration Accuracy

PSNR and SSIM on SCARED (DGGAN vs. state-of-the-art):

Degradation	DGGAN-DAM	DGGAN-DRPM	Restormer	SwinIR (comparable FLOP)
Random Noise	33.21/0.9057	31.03/0.8678	31.52/0.8710	[metrics not specified]
Motion blur	26.86/0.9210	–	23.01/0.8127	–
Low-light	28.01/0.8619	–	28.83/0.8657	–
Smoke	28.18/0.8553	–	27.37/0.8346	–

For GAN-based explicit artifact restorers, CGAN achieves $\overline{PSNR}=25.80$ , $\overline{SSIM}=0.997$ on blind deblurring with simulated motion, outperforming SRN-DeblurNet ( $\sim$ 24.5/$0.995$) and TV-deconvolution ( $\sim$ 23.5/$0.966$).

End-to-end, the explicit artifact pipeline recovers $\sim$ 25% more frames for downstream analysis and receives clinical ratings of $7.9$ (blur), $7.7$ (specularity/misc.), and $1.5$ (saturation correction) out of $10$ (Ali et al., 2019).

Real-World Video Performance

NIQE/PIQE metrics on SES real endoscopic videos improve with DGGAN-DAM (NIQE 3.62, PIQE 13.79) compared to Restormer (4.32/15.79). Qualitatively, outputs exhibit sharper edges and fewer artifacts under adverse conditions (uneven illumination, tissue scattering, smoke, motion blur) (Xu et al., 8 Dec 2025).

6. Significance and Future Directions

Degradation-aware frameworks that integrate explicit degradation modeling—via either contrastive feature propagation (Xu et al., 8 Dec 2025) or explicit artifact detection and GAN-based correction (Ali et al., 2019)—substantially improve the retention and enhancement of diagnostically and operatively valuable frames in endoscopic video. Such methods achieve real-time or near-real-time throughput, state-of-the-art gains in PSNR/SSIM, and meaningful restoration under severe multimodal artifact patterns.

Remaining challenges include the development of even lighter-weight transformers for deployment on surgical hardware, improved robustness to rare or previously unseen degradations, and further integration of physically motivated models into the generative restoration process. As implicit degradation learning and explicit pipeline-based strategies both demonstrate strong empirical advantages, ongoing comparisons and hybridizations are likely to shape the next generation of endoscopic video enhancement systems.

PDF Markdown Chat (Pro)

References (2)

A deep learning framework for quality assessment and restoration in video endoscopy (2019)

DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Degradation-Aware Framework for Endoscopic Video Enhancement.