All-in-One Medical Image Restoration & Segmentation

Updated 6 February 2026

All-in-One Medical Image Restoration and Segmentation (AiOMIRS) is a unified model that jointly enhances image quality and delineates anatomical structures via an integrated restoration-segmentation pipeline.
The methodology employs a multi-task UNet-style backbone with deep unfolding and frequency decoupling to address noise, blur, and domain-specific artifacts.
Empirical results demonstrate significant gains in Dice scores and PSNR, confirming robust performance across diverse modalities through joint optimization and semantic priors.

All-in-One Medical Image Restoration and Segmentation (AiOMIRS) defines a paradigm in which a unified model simultaneously performs image restoration (addressing noise, blur, and acquisition artifacts) and semantic segmentation (labeling anatomical structures) using a single, end-to-end pipeline. Restoration and segmentation, traditionally treated as separate stages, are approached in an integrated fashion to leverage their inherent synergy: restoration clarifies input for better semantic parsing, and segmentation regularizes ill-posed inverse problems through anatomical priors. Recent work establishes both practical frameworks for generalizable segmentation via frequency-space augmentation and domain-specific restoration (Zhou et al., 2022), and mathematically principled solutions via joint optimization and deep unfolding informed by vision-language priors (Chen et al., 30 Jan 2026).

1. Problem Formulation and Synergy

The AiOMIRS task considers an input degraded medical image $y \in \mathbb{R}^{H \times W \times C}$ —potentially characterized by undersampling, noise, or domain-specific artifacts—and seeks to output: (a) a high-quality, artifact-free reconstruction $x \in \mathbb{R}^{H \times W \times C}$ , and (b) a segmentation mask $S \in \{0,1\}^{H \times W \times K}$ for $K$ anatomical structures. Restoration and segmentation are interdependent processes: enhancement of anatomical details improves downstream semantic discrimination, while semantic delineation imposes global constraints on plausible restoration. Formally, AiOMIRS is cast as a coupled optimization problem: $\min_{x,S}\quad \tfrac12\|\mathcal{A}(x)-y\|_2^2 + \alpha R(x) + \beta \mathcal{L}_{\mathrm{seg}}(x, S)$ where $\mathcal{A}$ encapsulates modality-specific degradation, $R(x)$ is a restoration prior (e.g., learned via deep proximal mappings), and $\mathcal{L}_{\mathrm{seg}}$ combines cross-entropy and Dice losses on the segmentation output (Chen et al., 30 Jan 2026).

2. Architecture Design: Joint Restoration-Segmentation Networks

AiOMIRS frameworks typically adopt multi-task UNet-style backbones or joint deep unfolding strategies, incorporating the following architectural components:

Shared Encoder ( $E$ ): Multi-scale convolutional layers, as in UNet, produce latent representations from both original and style-augmented images.
Segmentation Decoder ( $D_{\text{seg}}$ ): An upsampling pathway mirroring encoder depth, producing class probability maps. Final layer outputs $K$ class logits via $1\times1$ convolution (Zhou et al., 2022).
Domain-Specific Image Restoration Decoder ( $D_{\text{rec}}$ ): For each source domain, a decoder with domain-specific batch normalization (DSBN) reconstructs the original input, regularizing the encoder to retain low-level style cues.
Frequency-aware Global Context (VL-DUN, (Chen et al., 30 Jan 2026)): Deep unfolding modules alternate between data fidelity updates and proximal regularization stages, the latter decoupling frequency content with gated depthwise convolution (GDFN) for low-frequency structure and bidirectional Mamba for high-frequency edge preservation.
Semantic Prior Injection: Vision-language priors (via fine-tuned CLIP encoders) are incorporated through cross-attention in the restoration pathway, regularizing both segmentation and restoration against semantic drift.

3. Frequency-Domain Augmentation and Domain Invariance

Random Amplitude Mixup (RAM) is introduced to synthesize style-diverse yet semantically consistent views that improve domain generalization. Given two images $x_i, x_j$ , their Fourier decompositions are: $F(x_i) = A_i \cdot e^{j\phi_i},\quad F(x_j) = A_j \cdot e^{j\phi_j}$ Mixup is performed on amplitudes: $A_{\text{mix}} = \alpha A_i + (1-\alpha) A_j,\ \alpha \sim U(0,1)$ while phases remain fixed ( $\phi_i$ ), ensuring semantic content preservation. The stylized image

$\tilde{x} = \mathcal{F}^{-1}(A_{\text{mix}} \cdot e^{j\phi_i})$

enables robust training against domain shift without compromising annotation consistency (Zhou et al., 2022).

In VL-DUN, traditional self-attention’s low-pass filtering is mitigated by explicit frequency splitting: average pooling yields the low-frequency branch (modeled by GDFN); residuals are processed by bidirectional Mamba layers to capture high-frequency anatomical boundaries, providing global context at linear computational cost (Chen et al., 30 Jan 2026).

4. Training Objectives and Optimization

Both frameworks design their learning targets to enforce consistency and mutual reinforcement between restoration and segmentation:

Segmentation Loss ( $L_{\text{seg}}$ ): A sum of cross-entropy and Dice losses applied to both original and RAM-augmented predictions.
Restoration Loss ( $L_{\text{recon}}$ ): $\ell_2$ regression between the reconstructed and original image, enforcing preservation of domain-specific appearance (Zhou et al., 2022). In VL-DUN, MSE is minimized between unfolded reconstruction and ground-truth (Chen et al., 30 Jan 2026).
Semantic Consistency Loss ( $L_{\text{consist}}$ ): Bidirectional KL divergence aligns class probability vectors on real and stylized images, penalizing predictive shifts induced by style changes.
Total Objective: Weighted sum: $L_{\text{total}} = L_{\text{seg}} + \lambda_1 L_{\text{recon}} + \lambda_2 L_{\text{consist}}$ (Zhou et al., 2022), or equivalent joint optimizations (Chen et al., 30 Jan 2026).

Optimization strategies include Adam or AdamW with polynomial or one-cycle learning rate schedules, typical batch sizes of 16, and training durations of 100 epochs. Task weights (e.g., $\lambda_1=0.1$ , $\lambda_2=0.5$ ) are empirically set for balance (Zhou et al., 2022).

5. Quantitative Performance and Empirical Insights

Performance benchmarks demonstrate substantial improvements over isolated and sequential pipelines on multi-domain public datasets:

Task	Baseline Dice (%)	AiOMIRS Dice (%)	Baseline PSNR (dB)	AiOMIRS PSNR (dB)	ASD Baseline	ASD AiOMIRS
Fundus Seg.	85.63	88.94	–	–	13.98	10.32
Prostate Seg.	84.04	88.08	–	–	2.92	1.37
Cine-MRI/CT	49.43	59.19	28.15	29.07	–	–

Compared to single-task pipelines (DeepAll, VLU-Net, JiGen, BigAug, SAML, FedDG, DoFE), AiOMIRS frameworks achieve an average 2–4 point gain in Dice and up to 0.92 dB in PSNR, with ASD reductions of 25–50% in cross-domain settings (Zhou et al., 2022, Chen et al., 30 Jan 2026). Ablation studies confirm the necessity of frequency decoupling, vision-language priors, and joint optimization for state-of-the-art performance.

6. Broader Implications, Limitations, and Prospects

The integration of domain-specific restoration alongside segmentation via self-supervision regularizes feature extractors against overfitting to domain artifacts while enhancing anatomical fidelity. Frequency-augmented data and loss-driven semantic consistency further mitigate the impact of domain shift, enabling robust generalization to unseen medical imaging protocols. AiOMIRS approaches demonstrate real-time inference potential (e.g., 58 ms for $256^2$ images with VL-DUN) and interpretability through iterative unfolding and frequency decomposition (Chen et al., 30 Jan 2026).

Current limitations include linear memory scaling with the number of source domains for DSIR owing to DSBN, and potential struggles of lightweight segmentation heads under severe class imbalance scenarios. Vision-language priors, though powerful, lack explicit physics modeling and simulated degradation does not fully capture the complexity of real-world clinical artifacts. Potential future extensions involve memory-efficient domain normalization, incorporation of physics-aware models, and tackling additional restoration tasks (e.g., denoising, super-resolution) under the AiOMIRS paradigm (Zhou et al., 2022, Chen et al., 30 Jan 2026).

7. Summary and Outlook

All-in-One Medical Image Restoration and Segmentation represents an advance toward unified, domain-agnostic clinical imaging pipelines. Empirical results confirm that joint, mutually informed restoration and segmentation with frequency-domain augmentation yields state-of-the-art performance and robust generalization. The framework’s principled coupling of low- and high-frequency contextual modeling, semantic priors, and end-to-end optimization establishes a foundation for future extensions in multi-modal, real-world medical imaging.

Markdown Report Issue Upgrade to Chat

References (2)

Generalizable Medical Image Segmentation via Random Amplitude Mixup and Domain-Specific Image Restoration (2022)

Vision-Language Controlled Deep Unfolding for Joint Medical Image Restoration and Segmentation (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to All-in-One Medical Image Restoration and Segmentation (AiOMIRS).