Large Model Based Referring Camouflaged Object Detection (2311.17122v1)
Abstract: Referring camouflaged object detection (Ref-COD) is a recently-proposed problem aiming to segment out specified camouflaged objects matched with a textual or visual reference. This task involves two major challenges: the COD domain-specific perception and multimodal reference-image alignment. Our motivation is to make full use of the semantic intelligence and intrinsic knowledge of recent Multimodal LLMs (MLLMs) to decompose this complex task in a human-like way. As language is highly condensed and inductive, linguistic expression is the main media of human knowledge learning, and the transmission of knowledge information follows a multi-level progression from simplicity to complexity. In this paper, we propose a large-model-based Multi-Level Knowledge-Guided multimodal method for Ref-COD termed MLKG, where multi-level knowledge descriptions from MLLM are organized to guide the large vision model of segmentation to perceive the camouflage-targets and camouflage-scene progressively and meanwhile deeply align the textual references with camouflaged photos. To our knowledge, our contributions mainly include: (1) This is the first time that the MLLM knowledge is studied for Ref-COD and COD. (2) We, for the first time, propose decomposing Ref-COD into two main perspectives of perceiving the target and scene by integrating MLLM knowledge, and contribute a multi-level knowledge-guided method. (3) Our method achieves the state-of-the-art on the Ref-COD benchmark outperforming numerous strong competitors. Moreover, thanks to the injected rich knowledge, it demonstrates zero-shot generalization ability on uni-modal COD datasets. We will release our code soon.
- Camouflaged object detection via context-aware cross-level fusion. IEEE Transactions on Circuits and Systems for Video Technology, 32(10):6981–6993, 2022a.
- Knowledge-embedded representation learning for fine-grained image recognition. arXiv preprint arXiv:1807.00505, 2018.
- Murag: Multimodal retrieval-augmented generator for open question answering over images and text. arXiv preprint arXiv:2210.02928, 2022b.
- Rosita: Enhancing vision-and-language semantic alignments via cross-and intra-modal knowledge integration. In Proceedings of the 29th ACM International Conference on Multimedia, pages 797–806, 2021.
- Structure-measure: A new way to evaluate foreground maps. In Proceedings of the IEEE international conference on computer vision, pages 4548–4557, 2017.
- Enhanced-alignment measure for binary foreground map evaluation. arXiv preprint arXiv:1805.10421, 2018.
- Camouflaged object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2777–2787, 2020.
- Concealed object detection. IEEE transactions on pattern analysis and machine intelligence, 44(10):6024–6042, 2021.
- Advances in deep concealed scene understanding. Visual Intelligence, 1(1):16, 2023.
- Camouflaged object detection with feature decomposition and edge reconstruction. In CVPR, pages 22046–22055, 2023a.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Weakly-supervised camouflaged object detection with scribble annotations. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 781–789, 2023b.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- High-resolution iterative feedback network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 881–889, 2023.
- Feature shrinkage pyramid for camouflaged object detection with transformers. In CVPR, 2023.
- Video polyp segmentation: A deep learning perspective. Machine Intelligence Research, 19(6):531–549, 2022a.
- Fast camouflaged object detection via edge-based reversible re-calibration network. PR, 123:108414, 2022b.
- Deep gradient learning for efficient camouflaged object detection. Machine Intelligence Research, 20(1):92–108, 2023a.
- Deep gradient learning for efficient camouflaged object detection. MIR, 20:92–108, 2023b.
- Sam struggles in concealed scenes–empirical study on ”segment anything”. SCIENCE CHINA Information Sciences, 66(12):226101, 2023c.
- Improving camouflaged object detection with the uncertainty of pseudo-edge labels. In ACM Multimedia Asia, pages 1–7. 2021.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
- Anabranch network for camouflaged object segmentation. Computer vision and image understanding, 184:45–56, 2019.
- Deeply-supervised nets. In AISTATS, pages 562–570. PMLR, 2015.
- Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv preprint arXiv:1910.13461, 2019.
- Uncertainty-aware joint salient object and camouflaged object detection. In CVPR, pages 10071–10081, 2021.
- Improved baselines with visual instruction tuning. arXiv preprint arXiv:2310.03744, 2023a.
- Visual instruction tuning, 2023b.
- Modeling aleatoric uncertainty for camouflaged object detection. In WACV, pages 1445–1454, 2022a.
- Pestnet: An end-to-end deep learning approach for large-scale multi-class pest detection and classification. Ieee Access, 7:45301–45312, 2019.
- Boosting camouflaged object detection with dual-task interactive transformer. In 2022 26th International Conference on Pattern Recognition (ICPR), pages 140–146. IEEE, 2022b.
- Simultaneously localize, segment and rank the camouflaged objects. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11591–11601, 2021.
- Towards deeper understanding of camouflaged object detection. IEEE TCSVT, 2023.
- How to evaluate foreground maps? In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 248–255, 2014.
- Camouflaged object segmentation with distraction mining. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8772–8781, 2021.
- Distraction-aware camouflaged object segmentation. SCIENTIA SINICA Informationis (SSI), 2023.
- Integrating image captioning with rule-based entity masking. arXiv preprint arXiv:2007.11690, 2020.
- Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 2160–2170, 2022.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Saliency filters: Contrast based filtering for salient region detection. In 2012 IEEE conference on computer vision and pattern recognition, pages 733–740. IEEE, 2012.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
- U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
- Dock: Detecting objects by transferring common-sense knowledge. In Proceedings of the European Conference on Computer Vision (ECCV), pages 492–508, 2018.
- Dqnet: Cross-model detail querying for camouflaged object detection. arXiv preprint arXiv:2212.08296, 2022a.
- Context-aware cross-level fusion network for camouflaged object detection. In IJCAI, pages 1025–1031, 2021a.
- Context-aware cross-level fusion network for camouflaged object detection. arXiv preprint arXiv:2105.12555, 2021b.
- Boundary-guided camouflaged object detection. arXiv preprint arXiv:2207.00794, 2022b.
- Lamda: Language models for dialog applications. arXiv preprint arXiv:2201.08239, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Source-free depth for object pop-out. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1032–1042, 2023.
- Exploring depth contribution for camouflaged object detection. arXiv preprint arXiv:2106.13217, 2021.
- Go closer to see better: Camouflaged object detection via object area amplification and figure-ground conversion. IEEE TCSVT, 2023.
- Multimodal learning with transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- Uncertainty-guided transformer reasoning for camouflaged object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4146–4155, 2021.
- mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration. arXiv preprint arXiv:2311.04257, 2023.
- Camoformer: Masked separable attention for camouflaged object detection. arXiv preprint arXiv:2212.06570, 2022.
- Camoformer: Masked separable attention for camouflaged object detection. arXiv preprint arXiv:2212.06570, 2023.
- Cross-modal knowledge reasoning for knowledge-based visual question answering. Pattern Recognition, 108:107563, 2020.
- Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022a.
- A survey of knowledge-enhanced text generation. ACM Computing Surveys, 54(11s):1–38, 2022b.
- Mutual graph learning for camouflaged object detection. In CVPR, pages 12997–13007, 2021.
- Preynet: Preying on camouflaged objects. In Proceedings of the 30th ACM International Conference on Multimedia, pages 5323–5332, 2022a.
- Tprnet: camouflaged object detection via transformer-induced progressive refinement network. The Visual Computer, pages 1–15, 2022b.
- Referring camouflaged object detection. arXiv preprint arXiv:2306.07532, 2023.
- Mffn: Multi-view feature fusion network for camouflaged object detection. In WACV, 2023.
- Detecting camouflaged object in frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4504–4513, 2022.
- Feature aggregation and propagation network for camouflaged object detection. IEEE Transactions on Image Processing, 31:7036–7047, 2022.
- I can find you! boundary-guided separated attention network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3608–3616, 2022.
- Inferring camouflaged objects by texture-aware interactive guidance network. In AAAI, pages 3599–3607, 2021.
- Cubenet: X-shape connection for camouflaged object detection. PR, 127:108644, 2022.