Explicit Motion Handling and Interactive Prompting for Video Camouflaged Object Detection (2403.01968v2)
Abstract: Camouflage poses challenges in distinguishing a static target, whereas any movement of the target can break this disguise. Existing video camouflaged object detection (VCOD) approaches take noisy motion estimation as input or model motion implicitly, restricting detection performance in complex dynamic scenes. In this paper, we propose a novel Explicit Motion handling and Interactive Prompting framework for VCOD, dubbed EMIP, which handles motion cues explicitly using a frozen pre-trained optical flow fundamental model. EMIP is characterized by a two-stream architecture for simultaneously conducting camouflaged segmentation and optical flow estimation. Interactions across the dual streams are realized in an interactive prompting way that is inspired by emerging visual prompt learning. Two learnable modules, i.e., the camouflaged feeder and motion collector, are designed to incorporate segmentation-to-motion and motion-to-segmentation prompts, respectively, and enhance outputs of the both streams. The prompt fed to the motion stream is learned by supervising optical flow in a self-supervised manner. Furthermore, we show that long-term historical information can also be incorporated as a prompt into EMIP and achieve more robust results with temporal consistency. Experimental results demonstrate that our EMIP achieves new state-of-the-art records on popular VCOD benchmarks. Our code is made publicly available at https://github.com/zhangxin06/EMIP.
- It’s moving! a probabilistic model for causal motion segmentation in moving camera videos. In ECCV, pages 433–449, 2016.
- Language models are few-shot learners. NeurIPS, 33:1877–1901, 2020.
- Implicit motion handling for video camouflaged object detection. In CVPR, 2022.
- Structure-measure: A New Way to Evaluate Foreground Maps. In ICCV, 2017.
- Camouflaged object detection. In CVPR, 2020.
- Pranet: Parallel reverse attention network for polyp segmentation. In MICCAI, 2020.
- Concealed object detection. IEEE TPAMI, 2021.
- Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis, 2021.
- A fully convolutional neural network for wood defect location and identification. IEEE Access, 2019.
- Camouflaged object detection with feature decomposition and edge reconstruction. In CVPR, pages 22046–22055, 2023.
- Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- High-resolution iterative feedback network for camouflaged object detection. In AAAI, volume 37, pages 881–889, 2023.
- Progressively normalized self-attention network for video polyp segmentation. pages 142–152. Springer, 2021.
- Video polyp segmentation: A deep learning perspective. MIR, 19(6):531–549, 2022.
- Deep gradient learning for efficient camouflaged object detection. 2023.
- Sam struggles in concealed scenes–empirical study on” segment anything”. Science China Information Sciences, 2023.
- Visual prompt tuning. In ECCV, pages 709–727. Springer, 2022.
- Segment, magnify and reiterate: Detecting camouflaged objects the hard way. In CVPR, pages 4713–4722, 2022.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Betrayed by motion: Camouflaged object discovery via motion segmentation. In ACCV, 2020.
- Foreground object detection using top-down information based on em framework. IEEE TIP, 21(9):4204–4217, 2012.
- Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In CVPR, pages 6489–6498, 2020.
- The emergence of objectness: Learning zero-shot segmentation from videos. NeurIPS, 34:13137–13152, 2021.
- How to evaluate foreground maps? In CVPR, 2014.
- Camouflaged object segmentation with distraction mining. In CVPR, pages 8772–8781, 2021.
- Video object segmentation using space-time memory networks. In ICCV, pages 9226–9235, 2019.
- Study on the camouflaged target detection method based on 3d convexity. Modern Applied Science, 5(4):152, 2011.
- Zoom in and out: A mixed-scale triplet network for camouflaged object detection. In CVPR, pages 2160–2170, 2022.
- Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 32, 2019.
- Fine-tuning image transformers using learnable memory. In CVPR, pages 12155–12164, 2022.
- Raft: Recurrent all-pairs field transforms for optical flow. In ECCV, pages 402–419. Springer, 2020.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Image quality assessment: from error visibility to structural similarity. IEEE TIP, 13(4):600–612, 2004.
- Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media, 8(3):415–424, 2022.
- Medical sam adapter: Adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023.
- Segmenting moving objects via an object-centric layered representation. In NeurIPS, 2022.
- Gmflow: Learning optical flow via global matching. In CVPR, pages 8121–8130, 2022.
- Semi-supervised video salient object detection using pseudo-labels. In ICCV, 2019.
- Self-supervised video object segmentation by motion grouping. In ICCV, 2021.
- Surface defect detection of heat sink based on lightweight fully convolutional network. IEEE Transactions on Instrumentation and Measurement, 71:1–12, 2022.
- Restormer: Efficient transformer for high-resolution image restoration. In CVPR, pages 5728–5739, 2022.
- Visualizing and understanding convolutional networks. In ECCV, pages 818–833. Springer, 2014.
- Mutual graph learning for camouflaged object detection. In CVPR, pages 12997–13007, 2021.
- Object detection with deep learning: A review. IEEE TNNLS, 30(11):3212–3232, 2019.
- Visual prompt multi-modal tracking. In CVPR, pages 9516–9526, 2023.
- Salient object detection via integrity learning. IEEE TPAMI, 2022.