Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Video Object Segmentation with Dynamic Query Modulation (2403.11529v1)

Published 18 Mar 2024 in cs.CV

Abstract: Storing intermediate frame segmentations as memory for long-range context modeling, spatial-temporal memory-based methods have recently showcased impressive results in semi-supervised video object segmentation (SVOS). However, these methods face two key limitations: 1) relying on non-local pixel-level matching to read memory, resulting in noisy retrieved features for segmentation; 2) segmenting each object independently without interaction. These shortcomings make the memory-based methods struggle in similar object and multi-object segmentation. To address these issues, we propose a query modulation method, termed QMVOS. This method summarizes object features into dynamic queries and then treats them as dynamic filters for mask prediction, thereby providing high-level descriptions and object-level perception for the model. Efficient and effective multi-object interactions are realized through inter-query attention. Extensive experiments demonstrate that our method can bring significant improvements to the memory-based SVOS method and achieve competitive performance on standard SVOS benchmarks. The code is available at https://github.com/zht8506/QMVOS.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. “Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion,” in CVPR, 2021, pp. 5559–5568.
  2. “Xmem++: Production-level video segmentation from few annotated frames,” in ICCV, 2023, pp. 635–644.
  3. “Tracking anything with decoupled video segmentation,” in ICCV, 2023, pp. 1316–1326.
  4. “Tackling background distraction in video object segmentation,” in ECCV. Springer, 2022, pp. 446–462.
  5. “End-to-end object detection with transformers,” in ECCV. Springer, 2020, pp. 213–229.
  6. “Masked-attention mask transformer for universal image segmentation,” in CVPR, 2022, pp. 1290–1299.
  7. “Mask dino: Towards a unified transformer-based framework for object detection and segmentation,” in CVPR, 2023, pp. 3041–3050.
  8. “Boxsnake: Polygonal instance segmentation with box supervision,” arXiv preprint arXiv:2303.11630, 2023.
  9. “Xmem: Long-term video object segmentation with an atkinson-shiffrin memory model,” in ECCV. Springer, 2022, pp. 640–658.
  10. “One-shot video object segmentation,” in CVPR, 2017, pp. 221–230.
  11. “Cnn in mrf: Video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf,” in CVPR, 2018, pp. 5977–5986.
  12. “Premvos: Proposal-generation, refinement and merging for video object segmentation,” in ACCV. Springer, 2018, pp. 565–580.
  13. “Video object segmentation using space-time memory networks,” in ICCV, 2019, pp. 9226–9235.
  14. “Rethinking space-time networks with improved memory coverage for efficient video object segmentation,” NeurIPS, vol. 34, pp. 11781–11794, 2021.
  15. “Joint inductive and transductive learning for video object segmentation,” in ICCV, 2021, pp. 9670–9679.
  16. “Associating objects with transformers for video object segmentation,” NeurIPS, vol. 34, pp. 2491–2502, 2021.
  17. “Dynamicbev: Leveraging dynamic queries and temporal context for 3d object detection,” arXiv preprint arXiv:2310.05989, 2023.
  18. “Attention is all you need,” NeurIPS, vol. 30, 2017.
  19. “Unihead: unifying multi-perception for detection heads,” arXiv preprint arXiv:2309.13242, 2023.
  20. “Etdnet: efficient transformer-based detection network for surface defect detection,” IEEE Transactions on Instrumentation and Measurement, 2023.
  21. “Ndc-scene: Boost monocular 3d semantic scene completion in normalized device coordinates space,” in ICCV. IEEE Computer Society, 2023, pp. 9421–9431.
  22. “Semanticac: semantics-assisted framework for audio classification,” in ICASSP. IEEE, 2023, pp. 1–5.
  23. “Bridging the gap: A unified video comprehension framework for moment retrieval and highlight detection,” arXiv preprint arXiv:2311.16464, 2023.
  24. “A benchmark dataset and evaluation methodology for video object segmentation,” in CVPR, 2016, pp. 724–732.
  25. “The 2017 davis challenge on video object segmentation,” arXiv preprint arXiv:1704.00675, 2017.
  26. “Collaborative video object segmentation by foreground-background integration,” in ECCV. Springer, 2020, pp. 332–348.
  27. “Learning what to learn for video object segmentation,” in ECCV. Springer, 2020, pp. 777–794.
  28. “Sstvos: Sparse spatiotemporal transformers for video object segmentation,” in CVPR, 2021, pp. 5912–5921.
  29. “Collaborative video object segmentation by multi-scale foreground-background integration,” TPAMI, vol. 44, no. 9, pp. 4701–4712, 2021.
  30. “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
  31. “Youtube-vos: A large-scale video object segmentation benchmark,” arXiv preprint arXiv:1809.03327, 2018.
  32. “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Hantao Zhou (7 papers)
  2. Runze Hu (15 papers)
  3. Xiu Li (166 papers)

Summary

We haven't generated a summary for this paper yet.