RGB-D Salient Object Detection: A Survey (2008.00230v4)

Published 1 Aug 2020 in cs.CV

Abstract: Salient object detection (SOD), which simulates the human visual perception system to locate the most attractive object(s) in a scene, has been widely applied to various computer vision tasks. Now, with the advent of depth sensors, depth maps with affluent spatial information that can be beneficial in boosting the performance of SOD, can easily be captured. Although various RGB-D based SOD models with promising performance have been proposed over the past several years, an in-depth understanding of these models and challenges in this topic remains lacking. In this paper, we provide a comprehensive survey of RGB-D based SOD models from various perspectives, and review related benchmark datasets in detail. Further, considering that the light field can also provide depth maps, we review SOD models and popular benchmark datasets from this domain as well. Moreover, to investigate the SOD ability of existing models, we carry out a comprehensive evaluation, as well as attribute-based evaluation of several representative RGB-D based SOD models. Finally, we discuss several challenges and open directions of RGB-D based SOD for future research. All collected models, benchmark datasets, source code links, datasets constructed for attribute-based evaluation, and codes for evaluation will be made publicly available at https://github.com/taozh2017/RGBDSODsurvey

Citations (226)

View on Semantic Scholar

Summary

The paper presents a systematic synthesis of RGB-D salient object detection models by contrasting handcrafted features with deep learning approaches.
It evaluates diverse fusion techniques, including early, late, and multi-scale methods, to improve integration of RGB and depth cues.
The study highlights the importance of attention mechanisms in focusing on salient regions and suggests innovative directions for future research.

An Overview of RGB-D Salient Object Detection: A Comprehensive Survey

The paper "RGB-D Salient Object Detection: A Survey" by Tao Zhou et al. presents a thorough examination of salient object detection (SOD) methodologies within RGB-D contexts. The authors elucidate the growing significance of depth maps, alongside RGB data, in enhancing the performance of SOD by mimicking human visual systems to identify the most attention-grabbing objects in various scenes.

Key Contributions

The paper is pivotal in its systematic synthesis of RGB-D SOD models, structured from different perspectives such as traditional versus deep learning approaches, varying fusion strategies, model architectures (single-stream/multi-stream), and attention mechanisms. This comprehensive categorization aids in tracing the evolution and understanding the diverse methodologies used in RGB-D SOD.

Traditional versus Deep Learning Models: The delineation between these two types shows the progression from handcrafted feature-based models that relied more on intrinsic image attributes to deep models that harness powerful deep neural networks to extract high-level feature representations. The latter significantly improve the ability to handle complex scene variations but introduce challenges related to depth map quality.
Fusion Techniques: The survey distinguishes between early fusion, late fusion, and multi-scale fusion methods, each presenting unique ways of integrating RGB data with depth cues. Multi-scale fusion, in particular, demonstrates superior ability to exploit the correlation between modalities by fusing them at various layers, contributing to robust and accurate SOD performance.
Attention Mechanisms: Attention in deep learning facilitates focusing on salient regions, strategically weighting significant features, and mitigating the effects of cluttered backgrounds. The integration of co-attention strategies represents an emerging area that leverages inter-modality interactions for improved saliency prediction.

The research evaluates various models and benchmarks across prevalent RGB-D datasets, revealing that deep learning approaches generally surpass traditional methods. Among deep methods, models like D $^3$ Net, JL-DCF, and UC-Net stand out due to their capability to effectively handle complex modalities and fuse multi-source data for SOD tasks.

Implications and Future Directions

Despite advancements, the authors note several challenges and propose exciting directions for future research:

Data Quality and Incompleteness: The robustness of RGB-D applications can be hampered by low-quality or incomplete depth maps. Future methodologies might focus on enhancing depth reliability through sophisticated refinement techniques or learning frameworks capable of dealing with partial modal inputs.
Advanced Fusion Methods: Adversarial learning and attention-based fusion strategies could potentially address the inadequacies of current fusion techniques, offering adaptable, robust solutions to maximize multi-modal data utilization.
Scalability of Dataset Collection: Increasing the scale and complexity of RGB-D datasets and exploring domain-specific applications remain critical for developing generalized models capable of performing under varied real-world scenes.
Algorithmic Innovations: Emphasizing efficient model designs suitable for real-time applications and exploring RGB-T SOD, where thermal data complements RGB visuals, could further advance the practical deployment of SOD technologies.

In sum, the paper provides a valuable resource for understanding existing RGB-D SOD methodologies and highlights impactful areas for future exploration. By compiling models and datasets, and offering critical insights into the fusion of RGB and depth data, it sets a research agenda that could propel significant breakthroughs in fields relying on visual perception and object detection.

PDF Markdown

Related Papers

GitHub

GitHub - taozh2017/RGBD-SODsurvey: RGB-D Salient Object Detection: A Survey (323 stars)