READMem: Robust Embedding Association for a Diverse Memory in Unconstrained Video Object Segmentation (2305.12823v2)
Abstract: We present READMem (Robust Embedding Association for a Diverse Memory), a modular framework for semi-automatic video object segmentation (sVOS) methods designed to handle unconstrained videos. Contemporary sVOS works typically aggregate video frames in an ever-expanding memory, demanding high hardware resources for long-term applications. To mitigate memory requirements and prevent near object duplicates (caused by information of adjacent frames), previous methods introduce a hyper-parameter that controls the frequency of frames eligible to be stored. This parameter has to be adjusted according to concrete video properties (such as rapidity of appearance changes and video length) and does not generalize well. Instead, we integrate the embedding of a new frame into the memory only if it increases the diversity of the memory content. Furthermore, we propose a robust association of the embeddings stored in the memory with query embeddings during the update process. Our approach avoids the accumulation of redundant data, allowing us in return, to restrict the memory size and prevent extreme memory demands in long videos. We extend popular sVOS baselines with READMem, which previously showed limited performance on long videos. Our approach achieves competitive results on the Long-time Video dataset (LV1) while not hindering performance on short sequences. Our code is publicly available.
- Human memory: A proposed system and its control processes. In Psychology of learning and motivation. Elsevier, 1968.
- Nils Barth. The gramian and k-volume in n-space: some classical results in linear algebra. Journal of Young Investigators, 1999.
- One-shot video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- XMem: Long-term video object segmentation with an atkinson-shiffrin memory model. In European Conference on Computer Vision (ECCV), 2022.
- Modular interactive video object segmentation: Interaction-to-mask, propagation and difference-aware fusion. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021a.
- Rethinking space-time networks with improved memory coverage for efficient video object segmentation. In Neural Information Processing Systems (NeurIPS), 2021b.
- On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
- Tackling background distraction in video object segmentation. In European Conference on Computer Vision (ECCV), 2022.
- Deep residual learning for image recognition. In Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- Motion-guided cascaded refinement network for video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Mask Scoring R-CNN. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Online video object segmentation via convolutional trident network. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- A generative appearance model for end-to-end video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Matej Kristan et al. The eighth visual object tracking vot2020 challenge results. In European Conference on Computer Vision (ECCV), 2020.
- Matej Kristan et al. The ninth visual object tracking vot2021 challenge results. In International Conference on Computer Vision (ICCV), 2021.
- Matej Kristan et al. The tenth visual object tracking vot2022 challenge results. In European Conference on Computer Vision (ECCV), 2022.
- Matej Kristan et al. The vots2023 challenge performance measures. 2023.
- Harold W Kuhn. The hungarian method for the assignment problem. Naval research logistics quarterly, 1955.
- Recurrent dynamic embedding for video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Fast video object segmentation using the global context module. In European Conference on Computer Vision (ECCV), 2020.
- Video object segmentation with adaptive feature bank and uncertain-region refinement. In Neural Information Processing Systems (NeurIPS), 2020.
- Swem: Towards real-time video object segmentation with sequential weighted expectation-maximization. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Global spectral filter memory network for video object segmentation. In European Conference on Computer Vision (ECCV), 2022a.
- Learning quality-aware dynamic memory for video object segmentation. In European Conference on Computer Vision (ECCV), 2022b.
- Video object segmentation without temporal information. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018.
- Make one-shot video object segmentation efficient again. Neural Information Processing Systems (NeurIPS), 2020.
- Fast video object segmentation by reference-guided mask propagation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Video object segmentation using space-time memory networks. In International Conference on Computer Vision (ICCV), 2019.
- Per-clip video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
- Learning video object segmentation from static images. In Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
- The 2017 davis challenge on video object segmentation. arXiv:1704.00675, 2017.
- Tracking holistic object representations. In British Machine Vision Conference (BMVC), 2019.
- Kernelized memory network for video object segmentation. In European Conference on Computer Vision (ECCV), 2020.
- Hierarchical memory matching network for video object segmentation. In International Conference on Computer Vision (ICCV), 2021.
- Ėrnest Borisovich Vinberg. A course in algebra. American Mathematical Soc., 2003.
- Online adaptation of convolutional neural networks for video object segmentation. British Machine Vision Conference (BMVC), 2017.
- Feelvos: Fast end-to-end embedding learning for video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
- Monet: Deep motion exploitation for video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Online meta adaptation for fast video object segmentation. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019.
- Efficient regional memory network for video object segmentation. In Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
- Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
- Zongxin Yang and Yi Yang. Decoupling features in hierarchical propagation for video object segmentation. In Neural Information Processing Systems (NeurIPS), 2022.
- Collaborative video object segmentation by foreground-background integration. In European Conference on Computer Vision (ECCV), 2020.
- Associating objects with transformers for video object segmentation. In Neural Information Processing Systems (NeurIPS), 2021a.
- Collaborative video object segmentation by multi-scale foreground-background integration. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021b.
- Joint inductive and transductive learning for video object segmentation. In International Conference on Computer Vision (ICCV), 2021.
- Fast video object segmentation via dynamic targeting network. In International Conference on Computer Vision (ICCV), 2019.
- Spatial consistent memory network for semi-supervised video object segmentation. In Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2020.
- A survey on deep learning technique for video segmentation. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
- Stéphane Vujasinović (4 papers)
- Sebastian Bullinger (16 papers)
- Stefan Becker (25 papers)
- Norbert Scherer-Negenborn (9 papers)
- Michael Arens (42 papers)
- Rainer Stiefelhagen (155 papers)