Fusion-Restoration Image Processing
- Fusion-restoration methods are techniques that integrate degraded, multi-modal images to simultaneously enhance quality and support downstream tasks.
- These approaches use self-supervised learning, degradation modeling, and transformer-based architectures to achieve adaptive, high-fidelity fusion and restoration.
- They deliver improved metrics in PSNR, SSIM, and segmentation accuracy, with applications in surveillance, autonomous vehicles, and remote sensing.
Fusion-restoration image processing methods integrate information from multiple input images—typically exhibiting different degradations or modalities—to generate high-quality outputs optimized for both perceptual quality and downstream vision applications. This class of algorithms encompasses approaches that leverage self-supervised feature learning, physically grounded degradation modeling, dynamic convolutional adaptation, spectral-spatial analysis, and transformer-based architectures. Fusion and restoration are often treated jointly, with shared representations and integrated loss functions ensuring that both the synthetic fidelity of the fused output and the robustness to complex, non-stationary degradations are maximized.
1. Unified Frameworks for Fusion and Restoration
Recent frameworks address fusion and restoration as mutually reinforcing tasks, aiming for adaptability across multi-exposure, multi-focus, multi-modal (e.g., visible/infrared), and dynamic degradation scenarios. Self-supervised learning, as exemplified by DeFusion++ (Liang et al., 2024), eliminates the need for hand-crafted fusion rules and bespoke loss functions. DeFusion++ introduces two SSL pretext tasks—Common and Unique Decomposition (CUD) and Masked Feature Modeling (MFM)—to extract representations that capture fused, task-agnostic features. These representations enable restoration (e.g., deblurring, dehazing) as a direct byproduct of the fusion pipeline, and can be further adapted with minimal tuning for high-level tasks such as segmentation and object detection. This paradigm supports a "single backbone, multiple applications" strategy, enhancing generalization, especially in domains lacking exhaustively annotated paired data.
2. Degradation- and Prompt-Aware Restoration-Fusion
Handling real-world degradations—compound, dynamic, and nonstationary—requires explicitly modeling the underlying physics and sensing contexts. ControlFusion (Tang et al., 30 Mar 2025) formulates a composite degradation process incorporating Retinex illumination-reflectance decomposition, atmospheric scattering, and sensor-induced distortions, automatically generating labeled data with varied, controllable severity. This is paired with a prompt-modulated fusion network: user- or model-provided language-vision prompts, processed via a CLIP encoder or a spatial-frequency visual adapter, modulate (via FiLM layers) the restoration and fusion process. The losses are adaptively weighted according to the prompt, allowing targeted enhancement (e.g., gradient loss under strong blur, color consistency under overexposure). DSPFusion (Tang et al., 30 Mar 2025) introduces a dual-prior mechanism: modality-specific degradation priors cluster by type via a contrastive objective, and a diffusion model operates in a compact latent space to restore high-quality scene semantics, resulting in robust real-time fusion for severely degraded input modalities.
| Framework | Degradation Modeling | Modulation Mechanism | Notable SSL/Fusion Innovations |
|---|---|---|---|
| DeFusion++ (Liang et al., 2024) | Data-driven SSL, noisification | CUD + MFM (SSL tasks) | Generalizable fused features, no hand-crafted rules |
| DSPFusion (Tang et al., 30 Mar 2025) | Explicit prior embedding | Dual-prior (degradation + semantic) | Fast diffusion in latent, prior-guided fusion |
| ControlFusion (Tang et al., 30 Mar 2025) | Physics-driven, promptable | CLIP/SFVA prompt, FiLM | Prompt-modulated fusion, adaptive loss |
These approaches both outperform static and supervised fusion methods and substantially improve metric scores under degradation—a critical requirement for deployment in uncontrolled or user-driven settings.
3. Joint Learning of Restoration and Fusion
Integrated frameworks such as DDRF-Net (Fang et al., 2021) and the infrared-assisted single-stage approach (Li et al., 2024) embed restoration and fusion in a single end-to-end optimization. DDRF-Net utilizes dynamic degradation kernels to explicitly simulate and adapt to spatio-temporal variations in blur and noise, while its dynamic fusion convolutions learn spatially-and temporally-adaptive fusion weights. The model's loss combines branch-specific restoration errors with fusion consistency losses, ensuring the two modules reinforce each other. Similarly, the infrared-assisted method employs prompt-based compensatory features, haze-density estimation (via dark channel prior refinement), and multi-stage prompt-fusion operations to simultaneously restore and fuse IR-VIS images under haze. Ablation studies confirm that mutual regularization and joint optimization (restoration driving fusion and vice versa) yield superior performance and robustness compared to cascaded or independently trained modules.
4. Advanced Feature Fusion Schemes
Recent architectures depart from fixed-weight or naive concatenation strategies by deploying modular, adaptive, or biologically inspired fusion mechanisms. CMFNet (Fan et al., 2022) employs a three-branch U-Net backbone inspired by the Retinal Ganglion Cells’ functional segregation (pixel, channel, spatial attention), followed by a learnable mixed skip connection and concatenation-based fusion. KBNet (Zhang et al., 2023) introduces kernel basis attention (KBA), linearly fusing learned convolutional bases with per-pixel predicted coefficients, and a multi-axis block for channel, spatially-invariant, and pixel-adaptive aggregation. 3DCF (Wu et al., 2016) uses 3D convolutional fusion to learn spatio-method patterns in stacked outputs from multiple restoration backbones. These schemes enable localized or context-aware selection of information, outperforming both simple averages and purely transformer-based attention in several restoration/fusion benchmarks.
5. Multiscale, Frequency-Domain, and Structured Priors
Broadening the domain of fusion beyond space, hybrid methods deploy frequency-domain analysis, Laplacian pyramids, or patch-based priors. HDFT (Shang et al., 2023) applies holistic frequency attention and dynamic frequency filtering—in the Fourier domain—within a Laplacian pyramid decomposition/reconstruction framework, optimizing multi-band fusion and restoration jointly. In the classic domain, Brovey–wavelet fusion (Shahdoosti, 2017) combines spatial detail from Brovey’s PAN sharpening with spectral fidelity from multiresolution à -trous wavelet decomposition, outperforming standard approaches in pan-sharpening. For single-image restoration in participating media, contrast and color priors are fused to construct robust composite transmission maps, yielding generalizable results across underwater, foggy, or turbid conditions (Gaya et al., 2016). Multiband fusion for remote sensing incorporates nonlocal patch regularization (NLPR) and efficient ADMM solvers, exploiting long-range correlations and texture in a globally convex variational setting (S. et al., 2022).
6. Alignment, Attention, and Guided Fusion in Complex Inputs
When fusion must occur across inputs with spatial misalignment, variable dynamic range, or distinct signal characteristics (e.g., burst imaging, high dynamic range), advanced schemes are necessary. Burstormer (Dudhane et al., 2023) employs transformer-based burst feature attention, multi-scale deformable alignment, cyclic burst sampling, and reference-based enrichment, achieving state-of-the-art denoising, super-resolution, and low-light enhancement. SFIGF (Liu et al., 2023) generalizes guided filter (GF) mechanisms to both image- and feature-level domains: a GF-inspired cross-attention module and a learnable GF-form output layer enable simultaneous recovery of global context and fine details across a spectrum of guided image restoration tasks (pan-sharpening, depth super-resolution, multi-focus fusion). Mixed attention (e.g., motion, scale, saturation) modules further act as fine-grained controls in progressive neural texture fusion for HDR restoration (Chen et al., 2021).
7. Quantitative Performance and Application Domains
State-of-the-art fusion-restoration methods are evaluated across diverse dataset regimes, including multi-exposure fusion (SICE, MEFB), multi-focus fusion (RealMFF), infrared-visible fusion (MSRS, TNO), and domain-specific tasks such as deformation measurement under extreme heat (Guan et al., 19 Jan 2026). DeFusion++ (Liang et al., 2024) yields MEF-SSIM=0.8407, no-reference PSNR/SSIM=33.41/0.9659 (RealMFF), and mIOU=78.01% (MSRS segmentation). DSPFusion (Tang et al., 30 Mar 2025) achieves EN=6.695, MI=4.736, and 1.044 VIF on MSRS, with a latency of 0.119s/image. M2Restore (Wang et al., 9 Jun 2025), a CLIP-guided mixture-of-experts (MoE) dual-branch Mamba-CNN framework, attains PSNR=32.01/31.20/31.73 and SSIM=0.955/0.911/0.943 on Outdoor-Rain, Snow100K-L, and Raindrop tasks, respectively.
These advances have immediate impact in surveillance (low-light and infrared fusion), autonomous vehicles (fog/rain/haze restoration and fusion), remote sensing (pan-sharpening, hyperspectral fusion), high-temperature deformation monitoring, and user-controllable restoration applications.
References:
- DeFusion++: "Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond" (Liang et al., 2024)
- DSPFusion: "Image Fusion via Degradation and Semantic Dual-Prior Guidance" (Tang et al., 30 Mar 2025)
- ControlFusion: "A Controllable Image Fusion Framework with Language-Vision Degradation Prompts" (Tang et al., 30 Mar 2025)
- Clarity ChatGPT: "An Interactive and Adaptive Processing System for Image Restoration and Enhancement" (Wei et al., 2023)
- CMFNet: "Compound Multi-branch Feature Fusion for Real Image Restoration" (Fan et al., 2022)
- KBNet: "Kernel Basis Network for Image Restoration" (Zhang et al., 2023)
- Burstormer: "Burst Image Restoration and Enhancement Transformer" (Dudhane et al., 2023)
- Single-Image Fusion for Participating Media: "Single Image Restoration for Participating Media Based on Prior Fusion" (Gaya et al., 2016)
- NLPR for Multiband Fusion: "Guided Nonlocal Patch Regularization and Efficient Filtering-Based Inversion for Multiband Fusion" (S. et al., 2022)
- SFIGF: "Guided Image Restoration via Simultaneous Feature and Image Guided Fusion" (Liu et al., 2023)
- APNT-Fusion: "Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration" (Chen et al., 2021)
- M2Restore: "Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration" (Wang et al., 9 Jun 2025)
- HDFT: "Holistic Dynamic Frequency Transformer for Image Fusion and Exposure Correction" (Shang et al., 2023)
- Brovey–Wavelet: "MS and PAN image fusion by combining Brovey and wavelet methods" (Shahdoosti, 2017)