Reverse Attention for Salient Object Detection (1807.09940v2)

Published 26 Jul 2018 in cs.CV

Abstract: Benefit from the quick development of deep learning techniques, salient object detection has achieved remarkable progresses recently. However, there still exists following two major challenges that hinder its application in embedded devices, low resolution output and heavy model weight. To this end, this paper presents an accurate yet compact deep network for efficient salient object detection. More specifically, given a coarse saliency prediction in the deepest layer, we first employ residual learning to learn side-output residual features for saliency refinement, which can be achieved with very limited convolutional parameters while keep accuracy. Secondly, we further propose reverse attention to guide such side-output residual learning in a top-down manner. By erasing the current predicted salient regions from side-output features, the network can eventually explore the missing object parts and details which results in high resolution and accuracy. Experiments on six benchmark datasets demonstrate that the proposed approach compares favorably against state-of-the-art methods, and with advantages in terms of simplicity, efficiency (45 FPS) and model size (81 MB).

Citations (543)

View on Semantic Scholar

Summary

The paper introduces reverse attention to dynamically erase predicted regions and focus on refining overlooked object details.
It leverages residual learning to incrementally enhance low-resolution saliency maps while significantly reducing model complexity.
Extensive experiments on six benchmarks demonstrate competitive F-measure scores and runtime efficiency, achieving 45 FPS.

Reverse Attention for Salient Object Detection: A Comprehensive Overview

The paper by Chen et al. introduces a novel approach to salient object detection (SOD) that addresses key challenges in existing models, specifically low-resolution outputs and heavy computational demands. The proposed solution involves a compact deep learning architecture leveraging residual learning and an innovative reverse attention mechanism.

Key Innovations

Residual Learning for Saliency Refinement: The authors build upon the HED architecture and utilize residual learning to enhance the resolution of saliency maps. This is achieved by treating salient object detection as a super-resolution reconstruction problem. By learning side-output residual features, the network incrementally refines the low-resolution saliency maps generated by deep network layers. This approach reduces parameter requirements significantly compared to other methods like DSS, while maintaining performance.
Reverse Attention Mechanism: Reverse attention is introduced to improve the capture of salient object details. By erasing currently predicted salient regions before learning residuals, the network is guided to focus on undiscovered object parts and boundaries, enhancing accuracy and resolution. This top-down processing helps in sequentially refining the salience maps, leading to efficient detection.

Experimental Results

The methodology was rigorously tested against state-of-the-art methods on six benchmarks: MSRA-B, HKU-IS, ECSSD, PASCAL-S, SOD, and DUT-OMRON. The model achieved competitive results with a performance of 45 FPS and a model size of 81 MB, highlighting an efficient balance between accuracy and computational feasibility.

Performance Metrics: The proposed method consistently achieved high F-measure scores, revealing robust detection capabilities especially on challenging datasets like DUT-OMRON.
Execution Time: The approach demonstrated superior efficiency with a runtime of 0.022 seconds per image on ECSSD, outperforming comparable models.

Contributions and Implications

The paper's contributions are manifold:

It pioneers a compact, efficient network that addresses the limitations of existing SOD frameworks.
The integration of reverse attention significantly enhances the model's ability to accurately detect and segment salient objects by dynamically directing the learning process away from already discovered salient regions.
The methodological advancements could potentially extend to other pixel-level prediction tasks, fostering improvements in related fields such as semantic segmentation and edge detection.

Future Prospects

Looking forward, this research opens several avenues for further exploration:

Model Optimization: Future work could focus on reducing the redundancy in the global saliency branch and the backbone network, possibly through integrating handcrafted saliency priors or training from scratch.
Broader Applications: The principles of reverse attention and residual learning may be applied to various computer vision tasks that require high accuracy with limited computational resources.

In conclusion, Chen et al.'s paper makes a substantial contribution to the field of salient object detection by combining reverse attention with residual learning, offering a robust and efficient framework that holds promise for advancing both theoretical research and practical applications in computer vision.

PDF Markdown