MDF: Multi-Defocus Fusion Methods

Updated 5 June 2026

The paper introduces MDF as a technique that fuses multiple images using weighted selection maps to create a sharp, all-in-focus output.
MDF methods leverage boundary-aware cascades, α-matte modeling, and GAN-based approaches to precisely manage defocus spread and ambiguous boundaries.
The approach employs realistic synthetic datasets and rigorous quantitative metrics to significantly enhance edge fidelity and reduce artifacts.

Multi-Defocus Fusion (MDF) methods refer to a diverse class of algorithms and architectures designed to combine multiple images (or image stacks) acquired at different focal settings into a single all-in-focus image or enhanced representation. MDF tackles the physical and algorithmic challenges associated with depth-of-field (DOF) limitations, defocus spread, and boundary ambiguity, and finds applications ranging from microscopy to electron microscopy and general photographic imaging. Contemporary MDF solutions span deep learning, optimization, decision-map, variational, and diffusion-based strategies, with approaches rigorously modeling boundary phenomena and dataset realism.

1. Formal Problem Definition and Mathematical Principles

Let $I_k(x,y)$ denote a set of $N$ perfectly registered source images, each acquired with a distinct focal setting. The objective is to generate a fused image $F(x,y)$ such that it is locally identical to the sharpest available source at each position:

$F(x,y) = \sum_{k=1}^N W_k(x,y)\,I_k(x,y),$

where $W_k(x,y)$ is a selection or soft attention map, typically constrained by $\sum_k W_k(x,y) = 1$ and $W_k(x,y) \in [0,1]$ . In two-image fusion, this often reduces to a binary decision map $DM(x,y)\in\{0,1\}$ or a continuous focus score $FS(x,y)\in[0,1]$ ; fusion then is

$F(x,y) = DM(x,y) I_1(x,y) + (1-DM(x,y)) I_2(x,y).$

Defocus measurement, sharpness criteria, or learned focus-confidence are core to MDF. Recent frameworks explicitly address focus/defocus boundaries (FDBs) and model the defocus spread effect, where blur extends beyond object edges due to lens PSF and occlusions (Ma et al., 2019, Ma et al., 2019, Wang et al., 2020).

2. Algorithmic Architectures and Boundary-Awareness

The modern MDF landscape includes architectures purpose-built for the focus/defocus boundary problem:

Boundary-Aware Cascades: Two-stage models deploy an initial fusion network to estimate global focus and a dedicated boundary-refinement network for ambiguous, mixed-focus pixels. For example, ResNet-56 is trained as both an “Initial Fusion Net” (global) and “Boundary Net” (localized only to boundary patches), with pixel classification into near- and far-FDB regions via running window averages of focus scores (Ma et al., 2019).
α-Matte Based Boundary Modeling: These methods synthesize data and networks under physically plausible layered image formation models, where defocus spreads are modeled by blurred alpha mattes (e.g., Gaussian kernels convolved with binary transmission layers). Networks first produce a soft “guidance map” for boundaries, with residual refinement strictly on the boundary band by a specialized sub-net; output fusion is weighted according to these maps (Ma et al., 2019).
Decision Map with Deep Feature Calibration: MDF methods such as GACN introduce a cascade that simultaneously predicts soft decision maps using deep spatial-frequency activations and produces fused images, eschewing empirical postprocessing. Decision-map calibration analytically refines ambiguous boundary regions using guided filtering or boundary masking, ensuring pixel-accurate assignment at boundaries (Ma et al., 2020).
GAN and Diffusion Approaches: Generative models (MFIF-GAN, ReDiffuse) directly address defocus spread by training discriminators on α-matte synthesized pairs, adversarial and gradient-aware losses, and, in ReDiffuse, incorporating rotation-group equivariant U-Nets to maintain geometric structure in fusion against symmetric or repetitive patterns (Wang et al., 2020, Li et al., 22 Mar 2026).

3. Dataset Generation and Simulation Fidelity

Robust MDF depends crucially on the availability of realistic, diverse, labeled datasets.

Synthetic Data with Realistic Boundaries: High-fidelity dataset generation involves combining matting cutouts with complex backgrounds, applying depth-dependent blurs only to occluded regions, and enforcing layered occlusion/blur logic. For instance, producing a pair by

$N$ 0

$N$ 1

closely mimics optical DOF behavior around boundaries (Ma et al., 2019, Ma et al., 2019).

Domain-Specific Simulation: In HAADF-STEM, MDF simulates multiple defocus images spanning the support thickness under multislice physics, considering probe convergence and collection angles to maximize elemental separability for atomic identification (Li et al., 7 May 2025).
Large-Scale Blender/Cycles Datasets: For high-resolution fusion, datasets like MattingMFIF utilize Blender-rendered 4K scenes with optically plausible DOF, realistic object placement, and associated all-in-focus ground truth (Piano et al., 22 Oct 2025).

Empirical results demonstrate that dataset realism significantly lowers boundary classification error and enhances fusion fidelity, especially in the vicinity of challenging FDB regions.

4. Quantitative Metrics and Benchmarking

MDF evaluation universally relies on metrics that discriminate both global and boundary-region fusion quality. Established criteria include:

Metric	Purpose	Higher-is-Better?
Qₙₘᵢ, Q_MI	Mutual information with input(s)	Yes
Q_G	Gradient-based sharpness consistency	Yes
Q_Y, Q_y	Structural similarity or SSIM variants	Yes
Q_CB	Human visual system–based assessment	Yes
MS-SSIM, EN	Multi-scale or entropy-based	Yes
MOS	Mean Opinion Score (human rating)	Yes
Edge/Frequency MI	Feature mutual information (edges/DCT)	Yes

Boundary-aware methods consistently demonstrate improvements over classical transforms (NSCT, SR, DSIFT), often yielding best-in-class metric scores both globally and specifically at the FDB (Ma et al., 2019, Ma et al., 2019, Ma et al., 2020).

5. Extensions: Multi-Image, Unsupervised, Physical and Domain-Specific Fusion

Multi-Image Fusion and Decision Volume Calibration: MDF extensions to more than two inputs select among $N$ 2 sources at each pixel using an analytically constructed “decision volume” built from pairwise decision maps and calibration formulas (e.g., $N$ 3 as a function of focus probabilities), yielding efficient and scalable multi-stack fusion (Ma et al., 2020, Piano et al., 22 Oct 2025).
Unsupervised and Optimization-Based Strategies: Techniques such as MFNet bypass curated ground truth by maximizing local SSIM in a sliding window, selectively matching fused output to the most in-focus patch from multiple sources. Gradient-based optimization frameworks (as in MFF-SSIM) directly maximize a patchwise fusion quality index, robustly reducing halos in strong defocus spread regimes (Yan et al., 2018, Xu et al., 2020).
Physical/Electron Microscopy Applications: In atomic-resolution STEM, MDF recovers Z-contrast by per-pixel maximum-intensity fusion over a defocus series and combines with LoG-based detection and Gaussian-mixture modeling to classify elements, achieving <5% classification error compared to ~50% for single-defocus frames (Li et al., 7 May 2025).
Depth-Map Guided MDF: By incorporating explicit depth sensing, MDF can segment the scene into DOF-compliant regions, assigning each block to its optimal focal plane based on the closest match to camera focus/distance, enabling artifact-free, order-of-magnitude faster real-time all-in-focus imaging (Liu et al., 2018).

6. Limitations, Scalability, and Research Outlook

Boundary Fragility and Spread: Despite advances, MDF remains sensitive to accurate boundary delineation and defocus spread modeling; performance may degrade in the presence of severe FDBs, thick PSFs, or imperfect registration.
Registration and Occlusion: The efficacy of latent-space or deep fusion approaches (e.g., VAEEDOF) assumes near-perfect alignment; methods may need further adaptation for dynamic or misaligned input bursts (Piano et al., 22 Oct 2025).
Efficiency Considerations: Cascade and decision volume approaches yield significant runtime reductions (up to 30–50% improvement), with high-resolution diffusion models leveraging weight-sharing for further acceleration (Ma et al., 2020, Li et al., 22 Mar 2026).
Potential Extensions: Future advances include the integration of rotation-equivariant group convolutions (for symmetry preservation), non-local attention to reinforce defocus-prone structures, light-field augmentation for better occlusion handling, and joint optimization of synthetic-real domain transfer.

7. Representative Results and State-of-the-Art Standing

Multiple MDF methods exhibit consistent outperformance of prior baselines across comprehensive quantitative benchmarks, particularly in boundary fidelity, artifact suppression, and visual quality:

Method	Main Feature	Perfect Boundary Score	Notable Advantages	Ref
Boundary Net MDF	2-channel ResNet + FDB refine	Yes	Top in Qₙₘᵢ, Q_G, Q_Y, Q_CB	(Ma et al., 2019)
α-Matte MMF-Net	Cascaded boundary fusion	Yes	Superior edges, no halos	(Ma et al., 2019)
GACN Cascade	End-to-End Decision Map	Yes	30–50% speedup, DM calibration	(Ma et al., 2020)
MFIF-GAN	α-matte GAN, DSE-matching	Yes	Pixel-accurate mask, SOTA metrics	(Wang et al., 2020)
VAEEDOF	Latent-space multi-image fusion	Yes (on synthetic)	Seamless 4K fusion, generative fill	(Piano et al., 22 Oct 2025)
Physical MDF (STEM)	Pixelwise max-intensity	N/A	<5% atom class error	(Li et al., 7 May 2025)
ReDiffuse	Rot.-equiv. diffusion (U-Net)	Yes (rotation)	Consistency for symmetric detail	(Li et al., 22 Mar 2026)

In summary, MDF encapsulates a spectrum of mathematically grounded, boundary-sensitive, and increasingly domain-adaptive techniques that set the current state of the art for multi-focus and multi-defocus fusion problems across disciplines.