Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Large Model Based Referring Camouflaged Object Detection (2311.17122v1)

Published 28 Nov 2023 in cs.CV

Abstract: Referring camouflaged object detection (Ref-COD) is a recently-proposed problem aiming to segment out specified camouflaged objects matched with a textual or visual reference. This task involves two major challenges: the COD domain-specific perception and multimodal reference-image alignment. Our motivation is to make full use of the semantic intelligence and intrinsic knowledge of recent Multimodal LLMs (MLLMs) to decompose this complex task in a human-like way. As language is highly condensed and inductive, linguistic expression is the main media of human knowledge learning, and the transmission of knowledge information follows a multi-level progression from simplicity to complexity. In this paper, we propose a large-model-based Multi-Level Knowledge-Guided multimodal method for Ref-COD termed MLKG, where multi-level knowledge descriptions from MLLM are organized to guide the large vision model of segmentation to perceive the camouflage-targets and camouflage-scene progressively and meanwhile deeply align the textual references with camouflaged photos. To our knowledge, our contributions mainly include: (1) This is the first time that the MLLM knowledge is studied for Ref-COD and COD. (2) We, for the first time, propose decomposing Ref-COD into two main perspectives of perceiving the target and scene by integrating MLLM knowledge, and contribute a multi-level knowledge-guided method. (3) Our method achieves the state-of-the-art on the Ref-COD benchmark outperforming numerous strong competitors. Moreover, thanks to the injected rich knowledge, it demonstrates zero-shot generalization ability on uni-modal COD datasets. We will release our code soon.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Camouflaged object detection via context-aware cross-level fusion. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6981–6993, 2022a.
  2. Knowledge-embedded representation learning for fine-grained image recognition. arXiv preprint arXiv:1807.00505, 2018.
  3. Murag: Multimodal retrieval-augmented generator for open question answering over images and text. arXiv preprint arXiv:2210.02928, 2022b.
  4. Rosita: Enhancing vision-and-language semantic alignments via cross-and intra-modal knowledge integration. In Proceedings of the 29th ACM International Conference on Multimedia, pages 797–806, 2021.
  5. Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, pages 4548–4557, 2017.
  6. Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, 2018.
  7. Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787, 2020.
  8. Concealed object detection. IEEE transactions on pattern analysis and machine intelligence, 44(10):6024–6042, 2021.
  9. Advances in deep concealed scene understanding. Visual Intelligence, 1(1):16, 2023.
  10. Camouflaged object detection with feature decomposition and edge reconstruction. In CVPR, pages 22046–22055, 2023a.
  11. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  12. Weakly-supervised camouflaged object detection with scribble annotations. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 781–789, 2023b.
  13. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
  14. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  15. High-resolution iterative feedback network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 881–889, 2023.
  16. Feature shrinkage pyramid for camouflaged object detection with transformers. In CVPR, 2023.
  17. Video polyp segmentation: A deep learning perspective. Machine Intelligence Research, 19(6):531–549, 2022a.
  18. Fast camouflaged object detection via edge-based reversible re-calibration network. PR, 123:108414, 2022b.
  19. Deep gradient learning for efficient camouflaged object detection. Machine Intelligence Research, 20(1):92–108, 2023a.
  20. Deep gradient learning for efficient camouflaged object detection. MIR, 20:92–108, 2023b.
  21. Sam struggles in concealed scenes–empirical study on ”segment anything”. SCIENCE CHINA Information Sciences, 66(12):226101, 2023c.
  22. Improving camouflaged object detection with the uncertainty of pseudo-edge labels. In ACM Multimedia Asia, pages 1–7. 2021.
  23. Segment anything. arXiv preprint arXiv:2304.02643, 2023.
  24. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
  25. Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56, 2019.
  26. Deeply-supervised nets. In AISTATS, pages 562–570. PMLR, 2015.
  27. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
  28. Uncertainty-aware joint salient object and camouflaged object detection. In CVPR, pages 10071–10081, 2021.
  29. Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
  30. Visual instruction tuning, 2023b.
  31. Modeling aleatoric uncertainty for camouflaged object detection. In WACV, pages 1445–1454, 2022a.
  32. Pestnet: An end-to-end deep learning approach for large-scale multi-class pest detection and classification. Ieee Access, 7:45301–45312, 2019.
  33. Boosting camouflaged object detection with dual-task interactive transformer. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 140–146. IEEE, 2022b.
  34. Simultaneously localize, segment and rank the camouflaged objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11591–11601, 2021.
  35. Towards deeper understanding of camouflaged object detection. IEEE TCSVT, 2023.
  36. How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.
  37. Camouflaged object segmentation with distraction mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8772–8781, 2021.
  38. Distraction-aware camouflaged object segmentation. SCIENTIA SINICA Informationis (SSI), 2023.
  39. Integrating image captioning with rule-based entity masking. arXiv preprint arXiv:2007.11690, 2020.
  40. Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 2160–2170, 2022.
  41. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  42. Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition, pages 733–740. IEEE, 2012.
  43. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  44. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  45. Dock: Detecting objects by transferring common-sense knowledge. In Proceedings of the European Conference on Computer Vision (ECCV), pages 492–508, 2018.
  46. Dqnet: Cross-model detail querying for camouflaged object detection. arXiv preprint arXiv:2212.08296, 2022a.
  47. Context-aware cross-level fusion network for camouflaged object detection. In IJCAI, pages 1025–1031, 2021a.
  48. Context-aware cross-level fusion network for camouflaged object detection. arXiv preprint arXiv:2105.12555, 2021b.
  49. Boundary-guided camouflaged object detection. arXiv preprint arXiv:2207.00794, 2022b.
  50. Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
  51. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
  52. Source-free depth for object pop-out. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1032–1042, 2023.
  53. Exploring depth contribution for camouflaged object detection. arXiv preprint arXiv:2106.13217, 2021.
  54. Go closer to see better: Camouflaged object detection via object area amplification and figure-ground conversion. IEEE TCSVT, 2023.
  55. Multimodal learning with transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  56. Uncertainty-guided transformer reasoning for camouflaged object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4146–4155, 2021.
  57. mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration. arXiv preprint arXiv:2311.04257, 2023.
  58. Camoformer: Masked separable attention for camouflaged object detection. arXiv preprint arXiv:2212.06570, 2022.
  59. Camoformer: Masked separable attention for camouflaged object detection. arXiv preprint arXiv:2212.06570, 2023.
  60. Cross-modal knowledge reasoning for knowledge-based visual question answering. Pattern Recognition, 108:107563, 2020.
  61. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022a.
  62. A survey of knowledge-enhanced text generation. ACM Computing Surveys, 54(11s):1–38, 2022b.
  63. Mutual graph learning for camouflaged object detection. In CVPR, pages 12997–13007, 2021.
  64. Preynet: Preying on camouflaged objects. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5323–5332, 2022a.
  65. Tprnet: camouflaged object detection via transformer-induced progressive refinement network. The Visual Computer, pages 1–15, 2022b.
  66. Referring camouflaged object detection. arXiv preprint arXiv:2306.07532, 2023.
  67. Mffn: Multi-view feature fusion network for camouflaged object detection. In WACV, 2023.
  68. Detecting camouflaged object in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4504–4513, 2022.
  69. Feature aggregation and propagation network for camouflaged object detection. IEEE Transactions on Image Processing, 31:7036–7047, 2022.
  70. I can find you! boundary-guided separated attention network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3608–3616, 2022.
  71. Inferring camouflaged objects by texture-aware interactive guidance network. In AAAI, pages 3599–3607, 2021.
  72. Cubenet: X-shape connection for camouflaged object detection. PR, 127:108644, 2022.
Citations (6)

Summary

We haven't generated a summary for this paper yet.