CSDNet: Detect Salient Object in Depth-Thermal via A Lightweight Cross Shallow and Deep Perception Network (2403.10104v1)
Abstract: While we enjoy the richness and informativeness of multimodal data, it also introduces interference and redundancy of information. To achieve optimal domain interpretation with limited resources, we propose CSDNet, a lightweight \textbf{C}ross \textbf{S}hallow and \textbf{D}eep Perception \textbf{Net}work designed to integrate two modalities with less coherence, thereby discarding redundant information or even modality. We implement our CSDNet for Salient Object Detection (SOD) task in robotic perception. The proposed method capitalises on spatial information prescreening and implicit coherence navigation across shallow and deep layers of the depth-thermal (D-T) modality, prioritising integration over fusion to maximise the scene interpretation. To further refine the descriptive capabilities of the encoder for the less-known D-T modalities, we also propose SAMAEP to guide an effective feature mapping to the generalised feature space. Our approach is tested on the VDT-2048 dataset, leveraging the D-T modality outperforms those of SOTA methods using RGB-T or RGB-D modalities for the first time, achieves comparable performance with the RGB-D-T triple-modality benchmark method with 5.97 times faster at runtime and demanding 0.0036 times fewer FLOPs. Demonstrates the proposed CSDNet effectively integrates the information from the D-T modality. The code will be released upon acceptance.
- Z. Xie, F. Shao, G. Chen, H. Chen, Q. Jiang, X. Meng, and Y.-S. Ho, “Cross-modality double bidirectional interaction and fusion network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- W. Zhou, Y. Zhu, J. Lei, R. Yang, and L. Yu, “Lsnet: Lightweight spatial boosting network for detecting salient objects in rgb-thermal images,” IEEE Transactions on Image Processing, vol. 32, pp. 1329–1340, 2023.
- Z. Liu, X. Huang, G. Zhang, X. Fang, L. Wang, and B. Tang, “Scribble-supervised rgb-t salient object detection,” arXiv preprint arXiv:2303.09733, 2023.
- B. Wan, X. Zhou, Y. Sun, T. Wang, C. Lv, S. Wang, H. Yin, and C. Yan, “Mffnet: Multi-modal feature fusion network for vdt salient object detection,” IEEE Transactions on Multimedia, 2023.
- H. Zhou, C. Tian, Z. Zhang, C. Li, Y. Ding, Y. Xie, and Z. Li, “Position-aware relation learning for rgb-thermal salient object detection,” IEEE Transactions on Image Processing, 2023.
- Q. Zhang, Q. Qin, Y. Yang, Q. Jiao, and J. Han, “Feature calibrating and fusing network for rgb-d salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2023.
- T. Zhou, H. Fu, G. Chen, Y. Zhou, D.-P. Fan, and L. Shao, “Specificity-preserving rgb-d saliency detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4681–4691.
- J. Zhang, D.-P. Fan, Y. Dai, X. Yu, Y. Zhong, N. Barnes, and L. Shao, “Rgb-d saliency detection via cascaded mutual information minimization,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4338–4347.
- X. Zhao, Y. Pang, L. Zhang, H. Lu, and X. Ruan, “Self-supervised pretraining for rgb-d salient object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 3463–3471.
- K. Song, J. Wang, Y. Bao, L. Huang, and Y. Yan, “A novel visible-depth-thermal image dataset of salient object detection for robotic visual perception,” IEEE/ASME Transactions on Mechatronics, 2022.
- M. Feng, H. Lu, and E. Ding, “Attentive feedback network for boundary-aware salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1623–1632.
- W.-D. Jin, J. Xu, Q. Han, Y. Zhang, and M.-M. Cheng, “Cdnet: Complementary depth network for rgb-d salient object detection,” IEEE Transactions on Image Processing, vol. 30, pp. 3376–3390, 2021.
- N. Huang, Q. Jiao, Q. Zhang, and J. Han, “Middle-level feature fusion for lightweight rgb-d salient object detection,” IEEE Transactions on Image Processing, vol. 31, pp. 6621–6634, 2022.
- M. Song, W. Song, G. Yang, and C. Chen, “Improving rgb-d salient object detection via modality-aware decoder,” IEEE Transactions on Image Processing, vol. 31, pp. 6124–6138, 2022.
- H. Bi, R. Wu, Z. Liu, H. Zhu, C. Zhang, and T.-Z. Xiang, “Cross-modal hierarchical interaction network for rgb-d salient object detection,” Pattern Recognition, vol. 136, p. 109194, 2023.
- G. Chen, F. Shao, X. Chai, H. Chen, Q. Jiang, X. Meng, and Y.-S. Ho, “Cgmdrnet: Cross-guided modality difference reduction network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 9, pp. 6308–6323, 2022.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” arXiv preprint arXiv:2304.02643, 2023.
- N. Bouhlel and S. Méric, “Maximum-likelihood parameter estimation of the product model for multilook polarimetric sar data,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 3, pp. 1596–1611, 2018.
- F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in 2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012, pp. 733–740.
- R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009, pp. 1597–1604.
- R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 248–255.
- D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” arXiv preprint arXiv:1805.10421, 2018.
- D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4548–4557.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7132–7141.
- J. Wang, K. Song, Y. Bao, L. Huang, and Y. Yan, “Cgfnet: Cross-guided fusion network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 2949–2961, 2021.
- F. Huo, X. Zhu, L. Zhang, Q. Liu, and Y. Shu, “Efficient context-guided stacked refinement network for rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 3111–3124, 2021.
- Z. Tu, Z. Li, C. Li, and J. Tang, “Weakly alignment-free rgbt salient object detection with deep correlation network,” IEEE Transactions on Image Processing, vol. 31, pp. 3752–3764, 2022.
- X. Jin, K. Yi, and J. Xu, “Moadnet: Mobile asymmetric dual-stream networks for real-time and lightweight rgb-d salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7632–7645, 2022.
- Q. Chen, Z. Zhang, Y. Lu, K. Fu, and Q. Zhao, “3-d convolutional neural networks for rgb-d salient object detection and beyond,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- Z. Liu, Y. Tan, Q. He, and Y. Xiao, “Swinnet: Swin transformer drives edge-aware rgb-d and rgb-t salient object detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 7, pp. 4486–4497, 2021.
- C. Zhang, D. Han, Y. Qiao, J. U. Kim, S.-H. Bae, S. Lee, and C. S. Hong, “Faster segment anything: Towards lightweight sam for mobile applications,” arXiv preprint arXiv:2306.14289, 2023.
- C. Zhang, D. Han, S. Zheng, J. Choi, T.-H. Kim, and C. S. Hong, “Mobilesamv2: Faster segment anything to everything,” arXiv preprint arXiv:2312.09579, 2023.
- X. Zhao, W. Ding, Y. An, Y. Du, T. Yu, M. Li, M. Tang, and J. Wang, “Fast segment anything,” arXiv preprint arXiv:2306.12156, 2023.
- J. Wu, R. Fu, H. Fang, Y. Liu, Z. Wang, Y. Xu, Y. Jin, and T. Arbel, “Medical sam adapter: Adapting segment anything model for medical image segmentation,” arXiv preprint arXiv:2304.12620, 2023.
- C. Chen, J. Miao, D. Wu, Z. Yan, S. Kim, J. Hu, A. Zhong, Z. Liu, L. Sun, X. Li et al., “Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation,” arXiv preprint arXiv:2309.08842, 2023.
- S. Gong, Y. Zhong, W. Ma, J. Li, Z. Wang, J. Zhang, P.-A. Heng, and Q. Dou, “3dsam-adapter: Holistic adaptation of sam from 2d to 3d for promptable medical image segmentation,” arXiv preprint arXiv:2306.13465, 2023.
- Y. Huang, C. Du, Z. Xue, X. Chen, H. Zhao, and L. Huang, “What makes multi-modal learning better than single (provably),” Advances in Neural Information Processing Systems, vol. 34, pp. 10 944–10 956, 2021.
- Z. Wu, S. Gobichettipalayam, B. Tamadazte, G. Allibert, D. P. Paudel, and C. Demonceaux, “Robust rgb-d fusion for saliency detection,” in 2022 International Conference on 3D Vision (3DV). IEEE, 2022, pp. 403–413.