Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Tsanet: Temporal and Scale Alignment for Unsupervised Video Object Segmentation (2303.04376v2)

Published 8 Mar 2023 in cs.CV

Abstract: Unsupervised Video Object Segmentation (UVOS) refers to the challenging task of segmenting the prominent object in videos without manual guidance. In recent works, two approaches for UVOS have been discussed that can be divided into: appearance and appearance-motion-based methods, which have limitations respectively. Appearance-based methods do not consider the motion of the target object due to exploiting the correlation information between randomly paired frames. Appearance-motion-based methods have the limitation that the dependency on optical flow is dominant due to fusing the appearance with motion. In this paper, we propose a novel framework for UVOS that can address the aforementioned limitations of the two approaches in terms of both time and scale. Temporal Alignment Fusion aligns the saliency information of adjacent frames with the target frame to leverage the information of adjacent frames. Scale Alignment Decoder predicts the target object mask by aggregating multi-scale feature maps via continuous mapping with implicit neural representation. We present experimental results on public benchmark datasets, DAVIS 2016 and FBMS, which demonstrate the effectiveness of our method. Furthermore, we outperform the state-of-the-art methods on DAVIS 2016.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. “See more, know more: Unsupervised video object segmentation with co-attention siamese networks,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3623–3632.
  2. “Video salient object detection via contrastive features and attention modules,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1320–1329.
  3. “F2net: Learning to focus on the foreground for unsupervised video object segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, pp. 2109–2117.
  4. “Motion-attentive transition for zero-shot video object segmentation,” in Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, pp. 13066–13073.
  5. “Reciprocal transformations for unsupervised video object segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 15455–15464.
  6. “Deep transport network for unsupervised video object segmentation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 8781–8790.
  7. “Learning motion-appearance co-attention for zero-shot video object segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 1564–1573.
  8. “Treating motion as option to reduce motion dependency in unsupervised video object segmentation,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5140–5149.
  9. “Flownet 2.0: Evolution of optical flow estimation with deep networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2462–2470.
  10. “Raft: Recurrent all-pairs field transforms for optical flow,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part II 16. Springer, 2020, pp. 402–419.
  11. “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  12. “Learning unsupervised video object segmentation through visual attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3064–3074.
  13. “Anchor diffusion for unsupervised video object segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 931–940.
  14. “Zero-shot video object segmentation via attentive graph neural networks,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9236–9245.
  15. “Learning discriminative feature with crf for unsupervised video object segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer, 2020, pp. 445–462.
  16. “Making a case for 3d convolutions for object segmentation in videos,” arXiv preprint arXiv:2008.11516, 2020.
  17. “Full-duplex strategy for video object segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 4922–4933.
  18. “D2conv3d: Dynamic dilated convolutions for object segmentation in videos,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 1200–1209.
  19. “Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2022, vol. 36, pp. 1245–1253.
  20. “Unsupervised video object segmentation via prototype memory network,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5924–5934.
  21. “Deep video matting via spatio-temporal alignment and aggregation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 6975–6984.
  22. “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 9308–9316.
  23. “Learning implicit feature alignment function for semantic segmentation,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX. Springer, 2022, pp. 487–505.
  24. “Videoinr: Learning video implicit neural representation for continuous space-time super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2047–2057.
  25. “A benchmark dataset and evaluation methodology for video object segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 724–732.
  26. “Segmentation of moving objects by long term video analysis,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 6, pp. 1187–1200, 2013.
  27. “Learning to detect salient objects with image-level supervision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 136–145.
  28. “One-trimap video matting,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXIX. Springer, 2022, pp. 430–448.
  29. “Efficient inference in fully connected crfs with gaussian edge potentials,” Advances in neural information processing systems, vol. 24, 2011.
Citations (1)

Summary

We haven't generated a summary for this paper yet.