Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Guided Slot Attention for Unsupervised Video Object Segmentation (2303.08314v3)

Published 15 Mar 2023 in cs.CV

Abstract: Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However, the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue, we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground--background separation. The foreground and background slots, which are initialized with query guidance, are iteratively refined based on interactions with template information. Furthermore, to improve slot--template interaction and effectively fuse global and local features in the target and reference frames, K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally, we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. Depth-supported real-time video segmentation with the kinect. In 2012 IEEE workshop on the applications of computer vision (WACV), pages 457–464. IEEE, 2012.
  2. Segflow: Joint learning for video object segmentation and optical flow. In Proceedings of the IEEE international conference on computer vision, pages 686–695, 2017.
  3. Treating motion as option to reduce motion dependency in unsupervised video object segmentation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5140–5149, 2023.
  4. Algorithm as 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1):100–108, 1979.
  5. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  6. Randla-net: Efficient semantic segmentation of large-scale point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11108–11117, 2020.
  7. Full-duplex strategy for video object segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4922–4933, 2021.
  8. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  9. Unsupervised video object segmentation via prototype memory network. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 5924–5934, 2023.
  10. Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1245–1253, 2022.
  11. Scouter: Slot attention-based classifier for explainable image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1046–1055, 2021.
  12. Video object detection for autonomous driving: Motion-aid feature calibration. Neurocomputing, 409:1–11, 2020.
  13. F2net: Learning to focus on the foreground for unsupervised video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2109–2117, 2021.
  14. Object-centric learning with slot attention. Advances in Neural Information Processing Systems, 33:11525–11538, 2020.
  15. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  16. See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3623–3632, 2019.
  17. 1 year, 1000 km: The oxford robotcar dataset. The International Journal of Robotics Research, 36(1):3–15, 2017.
  18. Segmentation of moving objects by long term video analysis. IEEE transactions on pattern analysis and machine intelligence, 36(6):1187–1200, 2013.
  19. Hierarchical feature alignment network for unsupervised video object segmentation. In European Conference on Computer Vision, pages 596–613. Springer, 2022.
  20. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 724–732, 2016.
  21. Suo Qiu. Global weighted average pooling bridges pixel-level localization and image-level classification. arXiv preprint arXiv:1809.08264, 2018.
  22. Reciprocal transformations for unsupervised video object segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15455–15464, 2021.
  23. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  24. Unsupervised video object segmentation with online adversarial self-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 688–698, 2023.
  25. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020.
  26. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  27. Reconstruction network for video captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7622–7631, 2018.
  28. Learning to detect salient objects with image-level supervision. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 136–145, 2017.
  29. F3net: fusion, feedback and focus for salient object detection. In Proceedings of the AAAI conference on artificial intelligence, pages 12321–12328, 2020.
  30. Segformer: Simple and efficient design for semantic segmentation with transformers. Advances in Neural Information Processing Systems, 34:12077–12090, 2021.
  31. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
  32. Self-supervised video object segmentation by motion grouping. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7177–7188, 2021a.
  33. Learning motion-appearance co-attention for zero-shot video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1564–1573, 2021b.
  34. Object-contextual representations for semantic segmentation. In European conference on computer vision, pages 173–190. Springer, 2020.
  35. Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8781–8790, 2021.
  36. Unsupervised video object segmentation with joint hotspot tracking. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIV 16, pages 490–506. Springer, 2020.
  37. Learning discriminative feature with crf for unsupervised video object segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16, pages 445–462. Springer, 2020.
  38. Motion-attentive transition for zero-shot video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 13066–13073, 2020.
  39. Slot-vps: Object-centric representation learning for video panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3093–3103, 2022.
  40. I can find you! boundary-guided separated attention network for camouflaged object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 3608–3616, 2022.
  41. Parts: Unsupervised segmentation with slots, attention and independence maximization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10439–10447, 2021.
Citations (5)

Summary

We haven't generated a summary for this paper yet.