Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

All in One: RGB, RGB-D, and RGB-T Salient Object Detection (2311.14746v1)

Published 23 Nov 2023 in cs.CV

Abstract: Salient object detection (SOD) aims to identify the most attractive objects within an image. Depending on the type of data being detected, SOD can be categorized into various forms, including RGB, RGB-D (Depth), RGB-T (Thermal) and light field SOD. Previous researches have focused on saliency detection with individual data type. If the RGB-D SOD model is forced to detect RGB-T data it will perform poorly. We propose an innovative model framework that provides a unified solution for the salient object detection task of three types of data (RGB, RGB-D, and RGB-T). The three types of data can be handled in one model (all in one) with the same weight parameters. In this framework, the three types of data are concatenated in an ordered manner within a single input batch, and features are extracted using a transformer network. Based on this framework, we propose an efficient lightweight SOD model, namely AiOSOD, which can detect any RGB, RGB-D, and RGB-T data with high speed (780FPS for RGB data, 485FPS for RGB-D or RGB-T data). Notably, with only 6.25M parameters, AiOSOD achieves excellent performance on RGB, RGB-D, and RGB-T datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Salient object detection: A benchmark. IEEE Transactions on Image Processing, 24(12):5706–5722, 2015.
  2. Depth-induced gap-reducing network for rgb-d salient object detection: An interaction, guidance and refinement approach. IEEE Transactions on Multimedia, pages 1–1, 2022.
  3. Does thermal really always matter for rgb-t salient object detection? IEEE Transactions on Multimedia, pages 1–12, 2022.
  4. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  5. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  6. Enhanced-alignment measure for binary foreground map evaluation. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pages 698–704. International Joint Conferences on Artificial Intelligence Organization, 2018.
  7. Shifting more attention to video salient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  8. Rethinking rgb-d salient object detection: Models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 32(5):2075–2089, 2021.
  9. Siamese network for rgb-d salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9):5541–5559, 2022.
  10. Database saliency for fast image retrieval. IEEE Transactions on Multimedia, 17(3):359–369, 2015.
  11. A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Transactions on Image Processing, 19(1):185–198, 2010.
  12. Efficient context-guided stacked refinement network for rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(5):3111–3124, 2022.
  13. Siatrans: Siamese transformer network for rgb-d salient object detection with depth image classification. Image and Vision Computing, 127:104549, 2022.
  14. Depth saliency based on anisotropic center-surround difference. In 2014 IEEE International Conference on Image Processing (ICIP), pages 1115–1119, 2014.
  15. Recursive contour-saliency blending network for accurate salient object detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 2940–2950, 2022.
  16. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  17. Cross-collaborative fusion-encoder network for robust rgb-thermal salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(11):7646–7661, 2022.
  18. Rethinking the u-shape structure for salient object detection. IEEE Transactions on Image Processing, 30:9030–9042, 2021a.
  19. Visual saliency transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4722–4732, 2021b.
  20. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, 2021c.
  21. Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection. IEEE Transactions on Circuits and Systems for Video Technology, 32(7):4486–4497, 2022.
  22. Caver: Cross-modal view-mixed transformer for bi-modal salient object detection. IEEE Transactions on Image Processing, 32:892–904, 2023.
  23. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  24. Rgbd salient object detection: A benchmark and algorithms. In Computer Vision – ECCV 2014, pages 92–109, Cham, 2014. Springer International Publishing.
  25. Depth-induced multi-scale recurrent attention network for saliency detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
  26. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  27. Rgb-t image saliency detection via collaborative graph learning. IEEE Transactions on Multimedia, 22(1):160–173, 2020.
  28. Rgbt salient object detection: A large-scale dataset and benchmark. IEEE Transactions on Multimedia, pages 1–1, 2022.
  29. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  30. Rgb-t saliency detection benchmark: Dataset, baselines, analysis and a novel approach. In Image and Graphics Technologies and Applications, pages 359–369, Singapore, 2018. Springer Singapore.
  31. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  32. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022.
  33. Cbam: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), pages 3–19, 2018.
  34. Mobilesal: Extremely efficient rgb-d salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(12):10261–10269, 2022.
  35. Hierarchical saliency detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
  36. Saliency detection via graph-based manifold ranking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013.
  37. Salient object segmentation via effective integration of saliency and objectness. IEEE Transactions on Multimedia, 19(8):1742–1756, 2017.
  38. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 558–567, 2021.
  39. C22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTdfnet: Criss-cross dynamic filter network for rgb-d salient object detection. IEEE Transactions on Multimedia, pages 1–13, 2022.
  40. Depth quality-inspired feature manipulation for efficient rgb-d salient object detection. In Proceedings of the 29th ACM International Conference on Multimedia, page 731–740, New York, NY, USA, 2021. Association for Computing Machinery.
  41. Csnet: a convnext-based siamese network for rgb-d salient object detection. The Visual Computer, pages 1–19, 2023.
  42. Complementary trilateral decoder for fast and accurate salient object detection. In Proceedings of the 29th ACM International Conference on Multimedia, page 4967–4975, New York, NY, USA, 2021. Association for Computing Machinery.
  43. Interactive two-stream decoder for accurate and fast saliency detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  44. Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images. IEEE Transactions on Image Processing, 32:1329–1340, 2023.
  45. Salient object detection via integrity learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3):3738–3752, 2023.
Citations (2)

Summary

We haven't generated a summary for this paper yet.