TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut
Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.
- Active learning for deep detection neural networks. In ICCV, 2019.
- Layer normalization. arXiv, 2016.
- A database and evaluation methodology for optical flow. IJCV, 2011.
- Beit: Bert pre-training of image transformers. arXiv, 2021.
- The fast bilateral solver. In ECCV, 2016.
- Emerging properties in self-supervised vision transformers. In ICCV, 2021.
- An empirical study of training self-supervised vision transformers. In ICCV, 2021.
- Semi-supervised semantic segmentation with cross pseudo supervision. In CVPR, 2021.
- Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation. PAMI, 2020.
- Global contrast based salient region detection. TPAMI, 2014.
- Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In CVPR, 2015.
- Localizing objects while learning their appearance. In ECCV, 2010.
- Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2018.
- An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
- The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. PASCAL VOC2007.
- The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. PASCAL VOC2012.
- Masked autoencoders as spatiotemporal learners. In CVPR, 2022.
- Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 2013.
- Masked autoencoders are scalable vision learners. CVPR, 2022.
- Distilling the knowledge in a neural network. arXiv, 2015.
- Co-attention cnns for unsupervised object co-segmentation. In IJCAI, 2018.
- Efficient coarse-to-fine patchmatch for large displacement optical flow. In CVPR, 2016.
- Salient object detection: A discriminative regional feature integration approach. In CVPR, 2013.
- Discriminative clustering for image co-segmentation. In CVPR, 2010.
- Multi-class cosegmentation. In CVPR, 2012.
- Universal weakly supervised segmentation by pixel-to-segment contrastive learning. ICLR, 2021.
- Unsupervised detection of regions of interest using iterative link analysis. In NeurIPS, 2009.
- Primary object segmentation in videos based on region augmentation and reduction. In CVPR, 2017.
- Efficient inference in fully connected crfs with gaussian edge potentials. NIPS, 2011.
- Extending layered models to 3d motion. In ECCV, 2018.
- Video segmentation by tracking many figure-ground segments. In ICCV, 2013.
- A weighted sparse coding framework for saliency detection. In CVPR, 2015.
- Mst: Masked self-supervised transformer for visual representation. In NeurIPS, 2021.
- Microsoft coco: Common objects in context. In ECCV, 2014.
- Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In CVPR, 2020.
- Unbiased teacher for semi-supervised object detection. In ICLR, 2021.
- Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In CVPR, 2022.
- Deepusps: Deep robust unsupervised saliency prediction via self-supervision. In NeurIPS, 2019.
- Segmentation of moving objects by long term video analysis. TPAMI, 2013.
- A benchmark dataset and evaluation methodology for video object segmentation. In CVPR, 2016.
- Partitioning sparse matrices with eigenvectors of graphs. SIAM journal on matrix analysis and applications, 1990.
- Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS, 2015.
- Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In CVPR, 2020.
- Learning co-segmentation by segment swapping for retrieval and discovery. arXiv preprint arXiv:2110.15904, 2021.
- Normalized cuts and image segmentation. TPAMI, 2000.
- Hierarchical image saliency detection on extended cssd. TPAMI, 2015.
- Unsupervised salient object detection with spectral cluster voting. In CVPRW, 2022.
- Viewal: Active learning with viewpoint entropy for semantic segmentation. In CVPR, 2020.
- Localizing objects with self-supervised transformers and no labels. In BMVC, 2021.
- Looking beyond the image: Unsupervised learning for object saliency and detection. In CVPR, 2013.
- Secrets of optical flow estimation and their principles. In CVPR, 2010.
- Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In CVPR, 2018.
- Co-localization in real-world images. In CVPR, 2014.
- Raft: Recurrent all-pairs field transforms for optical flow. In ECCV, 2020.
- Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv, 2022.
- Training data-efficient image transformers & distillation through attention. In ICML, 2021.
- Selective search for object recognition. IJCV, 2013.
- Matrix computations. The Johns Hopkins University Press, 1996.
- Attention is all you need. In NeurIPS, 2017.
- Object cosegmentation. In CVPR, 2011.
- Unsupervised image matching and object discovery as optimization. In CVPR, 2019.
- Toward unsupervised, multi-object discovery in large-scale image collections. In ECCV, 2020.
- Large-scale unsupervised object discovery. arXiv, 2021.
- Object segmentation without labels with large-scale generative models. In ICML, 2021.
- Learning to detect salient objects with image-level supervision. In CVPR, 2017.
- Unsupervised object discovery and co-localization by deep descriptor transformation. Pattern Recognition, 2019.
- Geodesic saliency using background priors. In ECCV, 2012.
- Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In CVPRW, 2017.
- Dota: A large-scale dataset for object detection in aerial images. In CVPR, 2018.
- Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
- Hierarchical saliency detection. In CVPR, 2013.
- Self-supervised video object segmentation by motion grouping. In CVPR, 2021.
- Saliency detection via graph-based manifold ranking. In CVPR, 2013.
- Dystab: Unsupervised object segmentation via dynamic-static bootstrapping. In CVPR, 2021.
- Unsupervised moving object detection via contextual information separation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
- Deformable sprites for unsupervised video decomposition. CVPR, 2022.
- Deep unsupervised saliency detection: A multiple noisy labeling perspective. In CVPR, 2018.
- Object discovery from a single unlabeled image by mining frequent itemsets with multi-scale features. TIP, 2020.
- Saliency optimization from robust background detection. In CVPR, 2014.
- Edge boxes: Locating object proposals from edges. In ECCV, 2014.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.