Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective (1803.10910v1)

Published 29 Mar 2018 in cs.CV

Abstract: The success of current deep saliency detection methods heavily depends on the availability of large-scale supervision in the form of per-pixel labeling. Such supervision, while labor-intensive and not always possible, tends to hinder the generalization ability of the learned models. By contrast, traditional handcrafted features based unsupervised saliency detection methods, even though have been surpassed by the deep supervised methods, are generally dataset-independent and could be applied in the wild. This raises a natural question that "Is it possible to learn saliency maps without using labeled data while improving the generalization ability?". To this end, we present a novel perspective to unsupervised saliency detection through learning from multiple noisy labeling generated by "weak" and "noisy" unsupervised handcrafted saliency methods. Our end-to-end deep learning framework for unsupervised saliency detection consists of a latent saliency prediction module and a noise modeling module that work collaboratively and are optimized jointly. Explicit noise modeling enables us to deal with noisy saliency maps in a probabilistic way. Extensive experimental results on various benchmarking datasets show that our model not only outperforms all the unsupervised saliency methods with a large margin but also achieves comparable performance with the recent state-of-the-art supervised deep saliency methods.

Citations (178)

Summary

  • The paper proposes a novel deep unsupervised framework for saliency detection that learns by leveraging multiple noisy labels generated by existing unsupervised methods, jointly optimizing prediction and noise modeling.
  • The framework explicitly models and leverages the noise inherent in outputs from various unsupervised methods, effectively combining their priors to achieve robust learning without human data.
  • Experimental results show the framework outperforms traditional unsupervised methods and approaches state-of-the-art supervised performance across benchmarks, enabling practical saliency detection without requiring costly human annotations.

Deep Unsupervised Saliency Detection: A Novel Approach to Harnessing Multiple Noisy Labels

The paper "Deep Unsupervised Saliency Detection: A Multiple Noisy Labeling Perspective" presents a fresh methodology for saliency detection that eschews the reliance on large-scale, human-annotated datasets. Traditional deep saliency detection methods have heavily leaned on per-pixel labeled data, which, while effective, may hinder the generalization capability of models and are costly in terms of human labor. The authors propose an innovative approach to unsupervised saliency detection by leveraging the outputs from multiple unsupervised saliency methods that generate "noisy" labels.

The proposed end-to-end deep learning framework is comprised of two main components: a latent saliency prediction module and a noise modeling module. Together, these components work to jointly optimize the learning process. The noise modeling module explicitly characterizes the noise inherent in the inputs generated by unsupervised methods, allowing the framework to manage this noise probabilistically. Such a framework effectively combines various priors established by conventional unsupervised techniques, such as center prior, global contrast prior, and background connectivity prior, aiming to integrate their complementary strengths into the learning process.

Experimental results across multiple benchmark datasets demonstrate that this innovative framework not only surpasses the performance of traditional unsupervised saliency detection methods by a significant margin but also approaches the efficacy of state-of-the-art supervised methods. The framework's ability to perform competitively without the necessity for laborious manual labeling highlights its potential cost-effectiveness and practical applicability.

The strength of the approach lies in its capacity to model the prediction and noise aspects concurrently, optimizing both components in tandem. Utilizing a fully convolutional network (FCN) based on ResNet architecture and employing cross-entropy as a loss function for prediction, the framework effectively learns the saliency map without human annotations. The noise distribution is modeled as a zero-mean Gaussian with individually estimated variances, which, through a process of iterative refinement, allows for convergence to an accurate saliency map.

A notable contribution is the application of the KL-divergence to compare the empirically observed variance of the prediction errors with the modeled noise distribution, facilitating more robust learning despite the presence of noise. Such a technique underscores the paper's argument for treating unsupervised saliency maps as learning from multiple noisy labels—a novel and previously unexplored perspective in the literature.

The implications of this research are manifold. Practically, it opens the door to deploying saliency models in environments where annotated data are sparse or altogether unavailable. Theoretically, it invites exploration into new probabilistic methods for handling noise in unsupervised learning setups. Looking forward, this approach may inspire further advances in other areas of computer vision, such as semantic segmentation and depth estimation, where similar challenges regarding the availability of labeled data exist.

This paper makes significant strides in understanding the trade-offs between easing human annotation efforts and maintaining high-performance outputs, representing an exciting direction for future research in deep unsupervised learning methodologies.