Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal (2012.07007v1)

Published 13 Dec 2020 in cs.CV and eess.IV

Abstract: Digital watermark is a commonly used technique to protect the copyright of medias. Simultaneously, to increase the robustness of watermark, attacking technique, such as watermark removal, also gets the attention from the community. Previous watermark removal methods require to gain the watermark location from users or train a multi-task network to recover the background indiscriminately. However, when jointly learning, the network performs better on watermark detection than recovering the texture. Inspired by this observation and to erase the visible watermarks blindly, we propose a novel two-stage framework with a stacked attention-guided ResUNets to simulate the process of detection, removal and refinement. In the first stage, we design a multi-task network called SplitNet. It learns the basis features for three sub-tasks altogether while the task-specific features separately use multiple channel attentions. Then, with the predicted mask and coarser restored image, we design RefineNet to smooth the watermarked region with a mask-guided spatial attention. Besides network structure, the proposed algorithm also combines multiple perceptual losses for better quality both visually and numerically. We extensively evaluate our algorithm over four different datasets under various settings and the experiments show that our approach outperforms other state-of-the-art methods by a large margin. The code is available at http://github.com/vinthony/deep-blind-watermark-removal.

Citations (39)

Summary

  • The paper introduces a novel two-stage approach that combines SplitNet and RefineNet with attention mechanisms to detect, remove, and refine watermark restoration without prior location data.
  • It employs a multi-task SplitNet for coarse watermark removal and uses spatially separated attention in RefineNet for detail recovery, reducing artifacts effectively.
  • Extensive evaluations on synthetic datasets demonstrate superior performance, achieving PSNR values over 40 dB compared to traditional methods.

An Expert Overview of Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal

The paper "Split then Refine: Stacked Attention-guided ResUNets for Blind Single Image Visible Watermark Removal" by Xiaodong Cun and Chi-Man Pun addresses the technically challenging task of removing visible watermarks from single images without prior watermark location data or user intervention. The proposed method improves upon current techniques by envisioning the watermark removal process as a two-stage task, employing a stack of Residual U-Nets (ResUNets) guided by a strategic attention mechanism.

Methodological Framework

SplitNet

The authors propose a two-stage framework to remove watermarks. The first stage employs SplitNet, a multi-task learning approach that integrates detection, removal, and recovery of watermarks into a singular network structure. SplitNet is based on ResUNet architecture, which combines attributes of deep residual learning with the encoder-decoder functionality of U-Nets. This configuration captures multi-scale feature hierarchies essential for discerning subtle watermark characteristics.

SplitNet distinguishes itself by integrating task-specific attentions within a multi-domain learning context. It employs shared encoding layers to process images and extends separate attention frameworks to handle the heterogeneity of tasks, enhancing task-specific efficacy while optimizing resource utilization. This setup contrasts markedly with traditional models, which attempt simultaneous detection and refinement, often diluting the nuanced requirements of each task due to intertwined feature dependencies.

RefineNet

The second stage involves RefineNet, which takes initial predictions from SplitNet (including the coarsely restored image and a watermark mask) and refines them. Notably, RefineNet employs spatially separated attention modules, which focus computational efforts specifically on the masked regions initially identified. This spatial attention is crucial for accurately reconstructing details in areas degraded by the watermark, ensuring that the final outputs exhibit fewer artifacts and closer semblance to unaltered textures.

Numerical Experiments and Results

In extensive evaluations across multiple synthesized datasets (LOGO-H, LOGO-L, LOGO-Gray, and LOGO-30K), the authors demonstrate notable improvements over existing methods, such as BVMR, UNet, and SIRF. The paper quantifies these improvements using metrics such as PSNR, SSIM, and LPIPS, with this work clearly outperforming the alternatives. For instance, the proposed model reached PSNR values over 40 dB across various dataset configurations, underscoring its robustness in challenging scenarios characterized by high watermark opacity and size.

Implications and Future Work

The method delineated in this paper has broad implications for applications in digital media security and rights management, where watermark removal is pertinent. The results suggest practical applicability in real-world environments, supported by the algorithm's ability to generalize across different types of watermark patterns and complexities without additional annotation or interactive interventions.

The introduction of spatially attentive modules signifies an advancement that could impact related tasks, such as shadow removal and image harmonization. Future work could explore expanding this model to encompass dynamic contexts, such as video frames, or adaptive learning that acclimatizes to varying watermarking schemes automatically.

Conclusion

Cun and Pun's research contribution significantly advances the field of image processing and security by addressing the visible watermark removal problem with a sophisticated, well-founded architectural approach. The structured two-stage framework and the integration of multi-domain attention mechanisms are insights that healthy further studies in automated image restoration challenges. This work elevates the standard for end-to-end systems aiming to restore media fidelity and could herald a new era of practical applications in digital rights management.