- The paper surveys deep learning methods for single-image reflection removal, reviewing approaches, datasets, metrics, and future directions from 2017-2025 research.
- It details mathematical models for reflection synthesis (linear/non-linear) and categorizes deep learning architectures into single-stage, two-stage, and multi-stage approaches.
- The survey highlights the critical need for larger, high-quality real-world datasets and a clearer task definition as major challenges for future research.
This paper is a survey on single-image reflection removal (SIRR) using deep learning techniques. It focuses on works published in top computer vision and AI venues like CVPR, ICCV, ECCV, NeurIPS, TPAMI, TIP, and Applied Intelligence between 2017 and 2025. The authors aim to provide a comprehensive review, outlining task hypotheses, current deep learning techniques, datasets, evaluation metrics, and future research directions.
Introduction
The introduction defines SIRR and highlights its importance, noting the limitations of traditional non-learning methods that rely on priors (sparsity, smoothness) which often fail to generalize to complex real-world scenarios. These traditional methods often operate under the assumption that the observed image is a linear combination of the transmission and reflection layers, an assumption that doesn't always hold true in reality. The survey emphasizes the recent shift towards data-driven deep learning approaches to overcome these limitations. It positions itself as a more thorough and up-to-date review compared to existing surveys, particularly those by Wan et al. and Amanlou et al.
Methodology
The survey methodology targets key conferences and journals known for high-impact computer vision and AI research. A specific search query is used to identify relevant papers, excluding non-research articles and applying a time filter (2017-2025). The resulting 28 papers form the basis of the literature review.
Mathematical Hypothesis
This section details the mathematical models used to represent the image formation process in the presence of reflections, categorizing them into linear and non-linear hypotheses:
- Linear Hypothesis: The simplest model assumes the captured image is a direct superposition (sum) of transmission and reflection layers:
I = T + R
. Variations introduce blending scalars (alpha
, beta
) to adjust the contributions of each layer (I = alpha * T + beta * R
). The survey notes that the blending of two images using constant values doesn't capture the complexities of real-world reflections.
- Non-Linear Hypothesis: This more complex modeling leverages the power of deep learning to incorporate learned priors. The synthesis process is modeled using alpha matting:
I = W o T + R
or I = W o T + (1-W) o R
, where W
is an alpha blending mask, and o
denotes element-wise multiplication. Other models introduce refractive and reflective amplitude coefficient maps (Omega
, Phi
), or consider degradation functions applied to the transmission and reflection layers (I = g(T) + f(R)
). A more general formulation is also presented: I = T + R + Phi(T, R)
to account for residual errors.
Reflection Removal Approaches
The survey categorizes deep learning-based SIRR approaches into single-stage, two-stage, and multi-stage architectures.
- Single-Stage Approaches: These approaches use a single network to directly decompose the input image into transmission and/or reflection layers. Examples mentioned are ERRNet, RobustSIRR, Zhang et al., and YTMT. RobustSIRR uses multi-resolution input, while YTMT uses an interactive "Your Trash is My Treasure" strategy.
- Two-Stage Approaches: These methods use a cascaded architecture, first estimating an intermediate output (reflection layer, coarse transmission layer, edge map, absorption effect) and then reconstructing the final transmission or reflection layer. Examples include CoRRN, DMGN, RAGNet, CEILNet, DSRNet, SP-net BT-net, Wan et al., Zheng et al., Zhu et al., and Language-Guided. Feature fusion techniques like convolutional fusion and concatenation are also noted.
- Multi-Stage Approaches: These approaches extend the two-stage design with multiple cascaded stages in a recurrent fashion. Examples are BDN, IBCLN, Chang et al., LANet, and V-DESIRR. They also employ techniques like convolutional fusion, concatenation, and recurrent connections.
The section also includes a discussion of commonly used loss functions: Reconstruction Loss (L1, L2), Gradient Consistency Loss, Perceptual Loss, Adversarial Loss, Exclusion Loss, Total Variation Loss, and Contextual Loss. It notes that evaluation metrics like PSNR and SSIM are sometimes used directly as loss terms.
Datasets and Evaluation Metrics
- Data Acquisition: This section categorizes datasets as synthetic and real-world and describes techniques for creating each.
- Synthetic datasets are created with image mixing, reflection blur, brightness adjustment, and physics-based rendering.
- Real-world datasets are created with manual glass removal, raw data subtraction, flash/no-flash pairs, polarization, and controlled environments.
- Current Public Datasets: Lists and compares key datasets (CEIL, RID, SIR2, Real, Nature, CDR, SIR2+, CID, RRW) along with their usage (train/test), number of image pairs, average resolution, and whether they are real or synthetic. RRW is noted as the largest and most recent dataset.
- Evaluation Metrics: Describes quantitative (PSNR, SSIM, MSE, Local MSE, NCC, Structure Index (SI)) and qualitative (perceptual user studies) metrics used for evaluating SIRR algorithms.
Discussion
This section presents a critical evaluation of SIRR research, highlighting key challenges and potential future directions:
- Challenges: Lack of large, high-quality training and test datasets representing diverse real-world reflection scenarios is the biggest challenge. The lack of data limits the exploration of complex network architectures. The ill-defined task (reflection removal vs. background reconstruction) is another challenge.
- Future Directions: Emphasizes the need for large-scale, high-quality datasets, the integration of advanced AIGC models, and the clarification of the SIRR task definition.
Conclusion
The paper concludes by summarizing the current state of SIRR research, emphasizing that clearer definitions and larger datasets are needed to unlock the full potential of deep learning methods.