Learning Uncertain Convolutional Features for Accurate Saliency Detection (1708.02031v1)

Published 7 Aug 2017 in cs.CV

Abstract: Deep convolutional neural networks (CNNs) have delivered superior performance in many computer vision tasks. In this paper, we propose a novel deep fully convolutional network model for accurate salient object detection. The key contribution of this work is to learn deep uncertain convolutional features (UCF), which encourage the robustness and accuracy of saliency detection. We achieve this via introducing a reformulated dropout (R-dropout) after specific convolutional layers to construct an uncertain ensemble of internal feature units. In addition, we propose an effective hybrid upsampling method to reduce the checkerboard artifacts of deconvolution operators in our decoder network. The proposed methods can also be applied to other deep convolutional networks. Compared with existing saliency detection methods, the proposed UCF model is able to incorporate uncertainties for more accurate object boundary inference. Extensive experiments demonstrate that our proposed saliency model performs favorably against state-of-the-art approaches. The uncertain feature learning mechanism as well as the upsampling method can significantly improve performance on other pixel-wise vision tasks.

Citations (424)

View on Semantic Scholar

Summary

The paper introduces deep uncertain convolutional features (UCF) with R-dropout to enhance model robustness in saliency detection.
It proposes a hybrid upsampling method that minimizes checkerboard artifacts while improving spatial reconstruction.
The unified FCN architecture eliminates the need for post-processing, setting high-performance benchmarks on standard saliency datasets.

Overview of "Learning Uncertain Convolutional Features for Accurate Saliency Detection"

In "Learning Uncertain Convolutional Features for Accurate Saliency Detection," the authors propose a new approach to saliency detection using deep convolutional neural networks (CNNs). The paper introduces three main contributions: the concept of deep uncertain convolutional features (UCF), a reformulated dropout (R-dropout) technique, and a novel hybrid upsampling method. These contributions are designed to enhance the robustness and accuracy of saliency detection models, which are crucial in computer vision tasks such as object retargeting, scene classification, semantic segmentation, and visual tracking.

The proposed model adopts a fully convolutional network (FCN) architecture featuring both encoder and decoder components. The encoder learns hierarchical features from raw images, while the decoder reconstructs them for pixel-wise classification. This structure is streamlined compared to traditional models with fully connected layers, as it leverages convolutional constructs throughout, improving computational efficiency and retaining spatial information of the input images.

Key Contributions

Deep Uncertain Convolutional Features (UCF): The authors introduce R-dropout, an extension of conventional dropout adapted for convolutional layers. This technique creates an ensemble of internal feature units, facilitating probabilistic feature learning that enhances the model's robustness and accuracy. The uncertain features generated are crucial for accurately inferring the boundaries of salient objects, providing a probabilistic layer to deep learning interpretations that are typically 'black-box' in nature.
Hybrid Upsampling Method: Traditional deconvolution operations often lead to checkerboard artifacts due to overlap issues during upsampling. The authors address this with a hybrid technique that combines constrained filter sizes and interpolation methods. This dual approach separates the processes of resolution enhancement and feature extraction, thereby significantly reducing artifacts and improving the modulation of output saliency maps.
Enhanced Architecture for Saliency Detection: By incorporating these contributions within an FCN architecture, the model unifies uncertain feature extraction and saliency detection. The entire architecture is trained end-to-end, optimizing weights and biases across layers through gradient descent. This facilitates the elimination of post-processing steps, setting high benchmarks on various datasets including ECSSD, DUT-OMRON, and HKU-IS.

Performance and Implications

The model demonstrates superior performance against state-of-the-art saliency detection methods in terms of F-measure and mean absolute error (MAE). The capability to better delineate object boundaries is a significant improvement, inferred from extensive testing. This not only impacts direct saliency detection but holds potential value in extending to related pixel-wise applications like semantic segmentation and eye fixation prediction. The application of UCF can generalize and extend probabilistic reasoning to other convolutional network-based vision tasks, thereby promising a broader impact.

Future Directions

The implications of this research stretch beyond saliency detection. The integration of probabilistic techniques like R-dropout offers valuable insights into the development of more transparent and interpretable deep learning models. Further exploration can focus on enhancing these probabilistic models for real-time applications and extending uncertainty analysis in dynamically changing visual inputs. Moreover, the hybrid upsampling method could be further refined or adapted for tasks requiring high fidelity image reconstruction.

In conclusion, the paper provides substantial advancements in the saliency detection domain through methodological innovations that leverage probabilistic modeling. These contributions are not just limited to improving current benchmark performances but also open avenues for further research and application across various computer vision tasks.

PDF Markdown