Visual Saliency Detection Based on Multiscale Deep CNN Features (1609.02077v1)

Published 7 Sep 2016 in cs.CV

Abstract: Visual saliency is a fundamental problem in both cognitive and computational sciences, including computer vision. In this paper, we discover that a high-quality visual saliency model can be learned from multiscale features extracted using deep convolutional neural networks (CNNs), which have had many successes in visual recognition tasks. For learning such saliency models, we introduce a neural network architecture, which has fully connected layers on top of CNNs responsible for feature extraction at three different scales. The penultimate layer of our neural network has been confirmed to be a discriminative high-level feature vector for saliency detection, which we call deep contrast feature. To generate a more robust feature, we integrate handcrafted low-level features with our deep contrast feature. To promote further research and evaluation of visual saliency models, we also construct a new large database of 4447 challenging images and their pixelwise saliency annotations. Experimental results demonstrate that our proposed method is capable of achieving state-of-the-art performance on all public benchmarks, improving the F- measure by 6.12% and 10.0% respectively on the DUT-OMRON dataset and our new dataset (HKU-IS), and lowering the mean absolute error by 9% and 35.3% respectively on these two datasets.

Authors (2)

Guanbin Li (177 papers)
Yizhou Yu (148 papers)

Citations (329)

View on Semantic Scholar

Summary

The paper introduces a novel multiscale deep CNN framework that integrates high-level deep features with low-level handcrafted features to create a robust deep contrast feature.
The approach outperforms existing methods, yielding up to a 10% increase in F-measure and a 35.3% decrease in mean absolute error on standard benchmarks.
The work also presents a new HKU-IS dataset comprising 4447 images, offering a challenging benchmark to further advance visual saliency research.

Visual Saliency Detection Based on Multiscale Deep CNN Features

The paper "Visual Saliency Detection Based on Multiscale Deep CNN Features" by Li and Yu presents a deeply intricate approach to the field of visual saliency detection by leveraging deep convolutional neural networks (CNNs). The authors root their work in the idea that high-quality visual saliency can be determined effectively by extracting features at multiple scales using CNNs, an approach notable for its success in visual recognition tasks. The architecture proposed integrates fully connected neural network layers atop CNNs, analyzing data at three distinct scales to produce what is termed as the deep contrast feature.

A primary innovation introduced in this research is the method of combining high-level deep features with low-level handcrafted features, thereby improving the robustness and discrimination power of the saliency model. The resultant model achieves state-of-the-art performance as evidenced by significant performance metrics. It exceeds prior benchmarks in terms of F-measure and mean absolute error on datasets such as DUT-OMRON, showing improvements up to 10% in F-measure and reducing mean absolute error by up to 35.3%.

The paper also introduces a new dataset, HKU-IS, comprising 4447 images to serve as a challenging benchmark for evaluating visual saliency models. This dataset addresses previous limitations in complexity and annotation quality found in existing datasets.

Key findings include:

The multiscale feature extraction employed allows for overcoming limitations seen in methods using solely local or handcrafted features by capturing semantic contrasts not typically represented in low-level features.
The introduction of the deep contrast feature as a discriminative high-level element that significantly boosts detection when coupled with low-level features.
Validation of the proposed model on public benchmarks indicates superior performance across diverse and challenging data sets.

The implications of this research are significant, providing a framework that effectively marries deep learning with traditional feature extraction to enhance the capability of visual attention models. The inclusion of multiscale analysis and multi-level segmentation can be seen as a step forward in extracting more meaningful hierarchical structure and semantics from images.

In terms of future development, the methodology presents several avenues for exploration. The authors themselves suggest the integration of spatial pyramid pooling networks to augment the computational efficiency of feature extraction—an essential consideration for scaling applications. Furthermore, the deep contrast feature could be extended to other computer vision problems such as depth prediction from monocular images and object proposals, suggesting wide-reaching implications for fields relying on image recognition and interpretation.

In summary, Li and Yu provide a substantial contribution to visual saliency detection, enhancing both its theoretical framework and practical applications. The combination of multiscale deep CNN features with handcrafted elements marks a significant step forward, promising improvements in various tasks where visual saliency is a pivotal component.

PDF Markdown

Visual Saliency Detection Based on Multiscale Deep CNN Features (1609.02077v1)

Summary

Visual Saliency Detection Based on Multiscale Deep CNN Features

Related Papers