Efficient Long-Short Temporal Attention Network for Unsupervised Video Object Segmentation (2309.11707v1)
Abstract: Unsupervised Video Object Segmentation (VOS) aims at identifying the contours of primary foreground objects in videos without any prior knowledge. However, previous methods do not fully use spatial-temporal context and fail to tackle this challenging task in real-time. This motivates us to develop an efficient Long-Short Temporal Attention network (termed LSTA) for unsupervised VOS task from a holistic view. Specifically, LSTA consists of two dominant modules, i.e., Long Temporal Memory and Short Temporal Attention. The former captures the long-term global pixel relations of the past frames and the current frame, which models constantly present objects by encoding appearance pattern. Meanwhile, the latter reveals the short-term local pixel relations of one nearby frame and the current frame, which models moving objects by encoding motion pattern. To speedup the inference, the efficient projection and the locality-based sliding window are adopted to achieve nearly linear time complexity for the two light modules, respectively. Extensive empirical studies on several benchmarks have demonstrated promising performances of the proposed method with high efficiency.
- STEm-Seg: Spatio-temporal embeddings for instance segmentation in videos. In Proceedings of the European Conference on Computer Vision (ECCV), pages 158–177, 2020.
- Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4974–4983, 2019.
- Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 801–818, 2018.
- Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In The International Conference on Neural Information Processing Systems (NeurIPS), 2021.
- Segflow: Joint learning for video object segmentation and optical flow. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 686–695, 2017.
- Treating motion as option to reduce motion dependency in unsupervised video object segmentation. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 5129–5138, 2023.
- Video object segmentation using point-based memory network. Pattern Recognition (PR), 134:109073, 2023.
- Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(2):386–397, 2020.
- Full-duplex strategy for video object segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 4922–4933, 2021.
- Segment anything is not always perfect: an investigation of sam on different real-world applications. arXiv, abs/2304.05750, 2023.
- Rethinking attention with performers. In Proceedings of the International Conference on Learning Representations (ICLR), 2021.
- Coherence-aware context aggregator for fast video object segmentation. Pattern Recognition (PR), 136:109214, 2023.
- Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 1245–1253, 2022.
- Anisotropic convolutional networks for 3D semantic scene completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3351–3359, 2020.
- Winner: weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 23090–23099, June 2023.
- Unsupervised video object segmentation with motion-based bilateral networks. In Proceedings of the European Conference on Computer Vision (ECCV), pages 215–231, 2018.
- Fast video object segmentation using the global context module. In Proceedings of the European Conference on Computer Vision (ECCV), pages 735–750, 2020.
- Local-global context aware transformer for language-guided video segmentation. IEEE Transactions Pattern Analysis and Machine Intelligence (TPAMI), 45(8):10055–10069, 2023.
- Video object segmentation with episodic graph memory networks. In Proceedings of the European Conference on Computer Vision (ECCV), volume 12348, pages 661–679, 2020.
- See More, Know More: Unsupervised video object segmentation with co-attention siamese networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3623–3632, 2019.
- UnOVOST: Unsupervised offline video object segmentation and tracking. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1989–1998, 2020.
- Making a case for 3d convolutions for object segmentation in videos. In Proceedings of the British Machine Vision Conference (BMVC), 2020.
- Em-driven unsupervised learning for efficient motion segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 45(4):4462–4473, 2023.
- Segmentation of moving objects by long term video analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(6):1187–1200, 2014.
- Video object segmentation using space-time memory networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 9226–9235, 2019.
- A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 724–732, 2016.
- The 2017 davis challenge on video object segmentation. arXiv preprint arXiv:1704.00675, 2017.
- Learning object class detectors from weakly annotated video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3282–3289, 2012.
- Reciprocal transformations for unsupervised video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15455–15464, 2021.
- Reciprocal transformations for unsupervised video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 15455–15464, 2021.
- Learning fast and robust target models for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 7406–7415, 2020.
- Training region-based object detectors with online hard example mining. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 761–769, 2016.
- Pyramid dilated deeper convlstm for video salient object detection. In Proceedings of the European Conference on Computer Vision (ECCV), pages 744–760, 2018.
- Munet: Motion uncertainty-aware semi-supervised video object segmentation. Pattern Recognition (PR), 138:109399, 2023.
- Adaptive roi generation for video object segmentation using reinforcement learning. Pattern Recognition (PR), 106:107465, 2020.
- Attention is all you need. In The International Conference on Neural Information Processing Systems (NeurIPS), pages 5999–6009, 2017.
- RVOS: End-to-end recurrent network for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5272–5281, 2019.
- Feelvos: Fast end-to-end embedding learning for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 9473–9482, 2019.
- Swiftnet: Real-time video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1296–1305, 2021.
- Zero-shot video object segmentation via attentive graph neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 9236–9245, 2019.
- Paying attention to video object pattern understanding. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(7):2413–2428, 2021.
- Saliency-aware video object segmentation. IEEE Transactions Pattern Analysis and Machine Intelligence (TPAMI), 40(1):20–33, 2018.
- Implicit motion-compensated network for unsupervised video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 32(9):6279–6292, 2022.
- Online meta adaptation for fast video object segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(5):1205–1217, 2020.
- Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
- Mhp-vos: Multiple hypotheses propagation for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 314–323, 2019.
- Dystab: Unsupervised object segmentation via dynamic-static bootstrapping. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2826–2836, 2021.
- Anchor diffusion for unsupervised video object segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 931–940, 2019.
- Agunet: Annotation-guided u-net for fast one-shot video object segmentation. Pattern Recognition (PR), 110:107580, 2021.
- Directional deep embedding and appearance learning for fast video object segmentation. IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 33(8):3884–3894, 2022.
- Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 8761–8770, 2021.
- Unsupervised video object segmentation with joint hotspot tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 490–506, 2020.
- Real-time and light-weighted unsupervised video object segmentation network. Pattern Recognition (PR), 120:108120, 2021.
- Target-aware object discovery and association for unsupervised video multi-object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6985–6994, 2021.
- MATNet: Motion-attentive transition network for zero-shot video object segmentation. IEEE Transactions on Image Processing (TIP), 29:8326–8338, 2020.
- A survey on deep learning technique for video segmentation. IEEE Transactions Pattern Analysis and Machine Intelligence (TPAMI), 45(6):7099–7122, 2023.
- Flow-edge guided unsupervised video object segmentation. IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 32(12):8116–8127, 2022.