Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection (2402.18922v1)

Published 29 Feb 2024 in cs.CV

Abstract: Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Previous works achieved good performance by stacking various hand-designed modules and multi-scale features. However, these carefully-designed complex networks often performed well on one task but not on another. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater versatility than meticulously crafted ones. Furthermore, to enhance the Transformer's ability to model local information, which is important for pixel-level binary segmentation tasks, we propose a local information capture module (LICM). We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects according to their size. Moreover, we explore the issue of joint training of SOD and COD, and propose a preliminary solution to the conflict in joint training, further improving the performance of SOD. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method. The code is available at https://github.com/linuxsino/SENet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. D.-P. Fan, G.-P. Ji, G. Sun, M.-M. Cheng, J. Shen, and L. Shao, “Camouflaged object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2777–2787.
  2. H. Li, C.-M. Feng, Y. Xu, T. Zhou, L. Yao, and X. Chang, “Zero-shot camouflaged object detection,” IEEE Transactions on Image Processing, 2023.
  3. Y. Yuan, P. Gao, and X. Tan, “M33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: Multilevel, mixed and multistage attention network for salient object detection,” arXiv preprint arXiv:2309.08365, 2023.
  4. Borji and Ali, “What is a salient object? a dataset and a baseline model for salient object detection,” IEEE Trans Image Process, vol. 24, no. 2, pp. 742–756, 2015.
  5. K. F. Yang, H. Li, C. Y. Li, and Y. J. Li, “A unified framework for salient structure detection by contour-guided visual search,” IEEE Transactions on Image Processing, vol. 25, no. 8, pp. 3475–3488, 2016.
  6. M. Sezgin and B. l. Sankur, “Survey over image thresholding techniques and quantitative performance evaluation,” Journal of Electronic imaging, vol. 13, no. 1, pp. 146–168, 2004.
  7. K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang et al., “Hybrid task cascade for instance segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 4974–4983.
  8. J.-X. Zhao, J.-J. Liu, D.-P. Fan, Y. Cao, J. Yang, and M.-M. Cheng, “Egnet: Edge guidance network for salient object detection,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 8779–8788.
  9. T.-N. Le, T. V. Nguyen, Z. Nie, M.-T. Tran, and A. Sugimoto, “Anabranch network for camouflaged object segmentation,” Computer vision and image understanding, vol. 184, pp. 45–56, 2019.
  10. Y. Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” in Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 2160–2170.
  11. Q. Jia, S. Yao, Y. Liu, X. Fan, R. Liu, and Z. Luo, “Segment, magnify and reiterate: Detecting camouflaged objects the hard way,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4713–4722.
  12. M. Ma, C. Xia, C. Xie, X. Chen, and J. Li, “Boosting broader receptive fields for salient object detection,” IEEE Transactions on Image Processing, vol. 32, pp. 1026–1038, 2023.
  13. X. Qin, Z. Zhang, C. Huang, C. Gao, M. Dehghan, and M. Jagersand, “Basnet: Boundary-aware salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7479–7489.
  14. W. Liu, X. Shen, C.-M. Pun, and X. Cun, “Explicit visual prompting for low-level structure segmentations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 434–19 445.
  15. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009.
  16. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
  17. L. Liebel and M. Körner, “Auxiliary tasks in multi-task learning,” arXiv preprint arXiv:1805.06334, 2018.
  18. X. Liu, Y. Zhang, Z. Yu, H. Lu, H. Yue, and J. Yang, “rppg-mae: Self-supervised pre-training with masked autoencoders for remote physiological measurement,” IEEE Transactions on Multimedia, 2024.
  19. A. Li, J. Zhang, Y. Lv, B. Liu, T. Zhang, and Y. Dai, “Uncertainty-aware joint salient object and camouflaged object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 071–10 081.
  20. J. Wei, S. Wang, and Q. Huang, “F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 12 321–12 328.
  21. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  22. X. Liu, K. Yuan, X. Niu, J. Shi, Z. Yu, H. Yue, and J. Yang, “Multi-scale promoted self-adjusting correlation learning for facial action unit detection,” arXiv preprint arXiv:2308.07770, 2023.
  23. J. Ahn and S. Kwak, “Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4981–4990.
  24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  25. S. Jie and Z.-H. Deng, “Convolutional bypasses are better vision transformer adapters,” arXiv preprint arXiv:2207.07039, 2022.
  26. P. O. Bressan, J. M. Junior, J. A. C. Martins, D. N. Gonçalves, D. M. Freitas, L. P. Osco, J. d. A. Silva, Z. Luo, J. Li, R. C. Garcia et al., “Semantic segmentation with labeling uncertainty and class imbalance,” arXiv preprint arXiv:2102.04566, 2021.
  27. A. Li, J. Zhang, Y. Lv, T. Zhang, Y. Zhong, M. He, and Y. Dai, “Joint salient object detection and camouflaged object detection via uncertainty-aware learning,” arXiv preprint arXiv:2307.04651, 2023.
  28. W. Liu, X. Shen, C.-M. Pun, and X. Cun, “Explicit visual prompting for universal foreground segmentations,” arXiv preprint arXiv:2305.18476, 2023.
  29. Q. Zhang, J. Lin, Y. Tao, W. Li, and Y. Shi, “Salient object detection via color and texture cues,” Neurocomputing, vol. 243, pp. 35–48, 2017.
  30. C. Scharfenberger, A. Wong, K. Fergani, J. S. Zelek, and D. A. Clausi, “Statistical textural distinctiveness for salient region detection in natural images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 979–986.
  31. R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, “Frequency-tuned salient region detection,” in 2009 IEEE conference on computer vision and pattern recognition.   IEEE, 2009, pp. 1597–1604.
  32. F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in 2012 IEEE conference on computer vision and pattern recognition.   IEEE, 2012, pp. 733–740.
  33. J. Wei, S. Wang, Z. Wu, C. Su, Q. Huang, and Q. Tian, “Label decoupling framework for salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 13 025–13 034.
  34. Y. K. Yun and W. Lin, “Selfreformer: Self-refined network with transformer for salient object detection,” arXiv e-prints, 2022.
  35. Z. Wu, L. Su, and Q. Huang, “Cascaded partial decoder for fast and accurate salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3907–3916.
  36. J. Li, S. Qiao, Z. Zhao, C. Xie, X. Chen, and C. Xia, “Rethinking lightweight salient object detection via network depth-width tradeoff,” arXiv preprint arXiv:2301.06679, 2023.
  37. Z. Wang, P. Wang, Y. Han, X. Zhang, M. Sun, and Q. Tian, “Curiosity-driven salient object detection with fragment attention,” IEEE Transactions on Image Processing, vol. 31, pp. 5989–6001, 2022.
  38. Y. Sun, S. Wang, C. Chen, and T.-Z. Xiang, “Boundary-guided camouflaged object detection,” in IJCAI, 2022, pp. 1335–1341.
  39. X. Hu, S. Wang, X. Qin, H. Dai, W. Ren, D. Luo, Y. Tai, and L. Shao, “High-resolution iterative feedback network for camouflaged object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 881–889.
  40. Z. Chen, R. Gao, T.-Z. Xiang, and F. Lin, “Diffusion model for camouflaged object detection,” arXiv preprint arXiv:2308.00303, 2023.
  41. D.-P. Fan, G.-P. Ji, M.-M. Cheng, and L. Shao, “Concealed object detection,” IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 10, pp. 6024–6042, 2021.
  42. Y. Lv, J. Zhang, Y. Dai, A. Li, B. Liu, N. Barnes, and D.-P. Fan, “Simultaneously localize, segment and rank the camouflaged objects,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 591–11 601.
  43. T. Zhou, Y. Zhou, C. Gong, J. Yang, and Y. Zhang, “Feature aggregation and propagation network for camouflaged object detection,” IEEE Transactions on Image Processing, vol. 31, pp. 7036–7047, 2022.
  44. Z. Song, X. Kang, X. Wei, H. Liu, R. Dian, and S. Li, “Fsnet: Focus scanning network for camouflaged object detection,” IEEE Transactions on Image Processing, 2023.
  45. Q. Zhai, X. Li, F. Yang, Z. Jiao, P. Luo, H. Cheng, and Z. Liu, “Mgl: Mutual graph learning for camouflaged object detection,” IEEE Transactions on Image Processing, vol. 32, pp. 1897–1910, 2022.
  46. L. Zhou, H. Liu, J. Bae, J. He, D. Samaras, and P. Prasanna, “Self pre-training with masked autoencoders for medical image classification and segmentation,” arXiv preprint arXiv:2203.05573, 2022.
  47. C. Zhang, C. Zhang, J. Song, J. S. K. Yi, K. Zhang, and I. S. Kweon, “A survey on masked autoencoder for self-supervised learning in vision and beyond,” arXiv preprint arXiv:2208.00173, 2022.
  48. I. Lee, E. Lee, and S. B. Yoo, “Latent-ofer: Detect, mask, and reconstruct with latent vectors for occluded facial expression recognition,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 1536–1546.
  49. C. Feichtenhofer, Y. Li, K. He et al., “Masked autoencoders as spatiotemporal learners,” Advances in neural information processing systems, vol. 35, pp. 35 946–35 958, 2022.
  50. M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu, “Reinforcement learning with unsupervised auxiliary tasks,” arXiv preprint arXiv:1611.05397, 2016.
  51. Z. Huang, H. Dai, T.-Z. Xiang, S. Wang, H.-X. Chen, J. Qin, and H. Xiong, “Feature shrinkage pyramid for camouflaged object detection with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5557–5566.
  52. Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, “Vision transformer adapter for dense predictions,” arXiv preprint arXiv:2205.08534, 2022.
  53. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 5728–5739.
  54. J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” arXiv preprint arXiv:1607.06450, 2016.
  55. P. Skurowski, H. Abdulameer, J. Błaszczyk, T. Depta, A. Kornacki, and P. Kozieł, “Animal camouflage analysis: Chameleon database,” Unpublished manuscript, vol. 2, no. 6, p. 7, 2018.
  56. L. Wang, H. Lu, Y. Wang, M. Feng, D. Wang, B. Yin, and X. Ruan, “Learning to detect salient objects with image-level supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 136–145.
  57. C. Yang, L. Zhang, H. Lu, X. Ruan, and M.-H. Yang, “Saliency detection via graph-based manifold ranking,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3166–3173.
  58. G. Li and Y. Yu, “Visual saliency based on multiscale deep features,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5455–5463.
  59. J. Shi, Q. Yan, L. Xu, and J. Jia, “Hierarchical image saliency detection on extended cssd,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 4, pp. 717–729, 2015.
  60. Y. Li, X. Hou, C. Koch, J. M. Rehg, and A. L. Yuille, “The secrets of salient object segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 280–287.
  61. H. Zhou, X. Xie, J.-H. Lai, Z. Chen, and L. Yang, “Interactive two-stream decoder for accurate and fast saliency detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9141–9150.
  62. J. Zhang, D.-P. Fan, Y. Dai, S. Anwar, F. S. Saleh, T. Zhang, and N. Barnes, “Uc-net: Uncertainty inspired rgb-d saliency detection via conditional variational autoencoders,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 8582–8591.
  63. H. Zhu, P. Li, H. Xie, X. Yan, D. Liang, D. Chen, M. Wei, and J. Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 3608–3616.
  64. J. Pei, T. Cheng, D.-P. Fan, H. Tang, C. Chen, and L. Van Gool, “Osformer: One-stage camouflaged instance segmentation with transformers,” in European Conference on Computer Vision.   Springer, 2022, pp. 19–37.
  65. R. Cong, M. Sun, S. Zhang, X. Zhou, W. Zhang, and Y. Zhao, “Frequency perception network for camouflaged object detection,” arXiv preprint arXiv:2308.08924, 2023.
  66. C. He, K. Li, Y. Zhang, L. Tang, Y. Zhang, Z. Guo, and X. Li, “Camouflaged object detection with feature decomposition and edge reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22 046–22 055.
  67. Y. Pang, X. Zhao, L. Zhang, and H. Lu, “Multi-scale interactive network for salient object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9413–9422.
  68. K.-H. Thung and C.-Y. Wee, “A brief review on multi-task learning,” Multimedia Tools and Applications, vol. 77, pp. 29 705–29 725, 2018.
  69. S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv:1706.05098, 2017.
  70. N. Liu, N. Zhang, K. Wan, L. Shao, and J. Han, “Visual saliency transformer,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2021, pp. 4722–4732.
  71. P. Yan, Z. Wu, M. Liu, K. Zeng, L. Lin, and G. Li, “Unsupervised domain adaptive salient object detection through uncertainty-aware pseudo-label learning,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, 2022, pp. 3000–3008.
  72. S. Gao, W. Zhang, Y. Wang, Q. Guo, C. Zhang, Y. He, and W. Zhang, “Weakly-supervised salient object detection using point supervision,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 1, 2022, pp. 670–678.
  73. H. Zhou, B. Qiao, L. Yang, J. Lai, and X. Xie, “Texture-guided saliency distilling for unsupervised salient object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7257–7267.
  74. C. Wei, H. Fan, S. Xie, C.-Y. Wu, A. Yuille, and C. Feichtenhofer, “Masked feature prediction for self-supervised visual pre-training,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 14 668–14 678.
Citations (2)

Summary

We haven't generated a summary for this paper yet.