Papers
Topics
Authors
Recent
Search
2000 character limit reached

TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut

Published 1 Sep 2022 in cs.CV and stat.ML | (2209.00383v3)

Abstract: In this paper, we describe a graph-based algorithm that uses the features obtained by a self-supervised transformer to detect and segment salient objects in images and videos. With this approach, the image patches that compose an image or video are organised into a fully connected graph, where the edge between each pair of patches is labeled with a similarity score between patches using features learned by the transformer. Detection and segmentation of salient objects is then formulated as a graph-cut problem and solved using the classical Normalized Cut algorithm. Despite the simplicity of this approach, it achieves state-of-the-art results on several common image and video detection and segmentation tasks. For unsupervised object discovery, this approach outperforms the competing approaches by a margin of 6.1%, 5.7%, and 2.6%, respectively, when tested with the VOC07, VOC12, and COCO20K datasets. For the unsupervised saliency detection task in images, this method improves the score for Intersection over Union (IoU) by 4.4%, 5.6% and 5.2%. When tested with the ECSSD, DUTS, and DUT-OMRON datasets, respectively, compared to current state-of-the-art techniques. This method also achieves competitive results for unsupervised video object segmentation tasks with the DAVIS, SegTV2, and FBMS datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (80)
  1. Active learning for deep detection neural networks. In ICCV, 2019.
  2. Layer normalization. arXiv, 2016.
  3. A database and evaluation methodology for optical flow. IJCV, 2011.
  4. Beit: Bert pre-training of image transformers. arXiv, 2021.
  5. The fast bilateral solver. In ECCV, 2016.
  6. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  7. An empirical study of training self-supervised vision transformers. In ICCV, 2021.
  8. Semi-supervised semantic segmentation with cross pseudo supervision. In CVPR, 2021.
  9. Show, match and segment: Joint weakly supervised learning of semantic matching and object co-segmentation. PAMI, 2020.
  10. Global contrast based salient region detection. TPAMI, 2014.
  11. Unsupervised object discovery and localization in the wild: Part-based matching with bottom-up region proposals. In CVPR, 2015.
  12. Localizing objects while learning their appearance. In ECCV, 2010.
  13. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2018.
  14. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  15. The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results. PASCAL VOC2007.
  16. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. PASCAL VOC2012.
  17. Masked autoencoders as spatiotemporal learners. In CVPR, 2022.
  18. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 2013.
  19. Masked autoencoders are scalable vision learners. CVPR, 2022.
  20. Distilling the knowledge in a neural network. arXiv, 2015.
  21. Co-attention cnns for unsupervised object co-segmentation. In IJCAI, 2018.
  22. Efficient coarse-to-fine patchmatch for large displacement optical flow. In CVPR, 2016.
  23. Salient object detection: A discriminative regional feature integration approach. In CVPR, 2013.
  24. Discriminative clustering for image co-segmentation. In CVPR, 2010.
  25. Multi-class cosegmentation. In CVPR, 2012.
  26. Universal weakly supervised segmentation by pixel-to-segment contrastive learning. ICLR, 2021.
  27. Unsupervised detection of regions of interest using iterative link analysis. In NeurIPS, 2009.
  28. Primary object segmentation in videos based on region augmentation and reduction. In CVPR, 2017.
  29. Efficient inference in fully connected crfs with gaussian edge potentials. NIPS, 2011.
  30. Extending layered models to 3d motion. In ECCV, 2018.
  31. Video segmentation by tracking many figure-ground segments. In ICCV, 2013.
  32. A weighted sparse coding framework for saliency detection. In CVPR, 2015.
  33. Mst: Masked self-supervised transformer for visual representation. In NeurIPS, 2021.
  34. Microsoft coco: Common objects in context. In ECCV, 2014.
  35. Learning by analogy: Reliable supervision from transformations for unsupervised optical flow estimation. In CVPR, 2020.
  36. Unbiased teacher for semi-supervised object detection. In ICLR, 2021.
  37. Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization. In CVPR, 2022.
  38. Deepusps: Deep robust unsupervised saliency prediction via self-supervision. In NeurIPS, 2019.
  39. Segmentation of moving objects by long term video analysis. TPAMI, 2013.
  40. A benchmark dataset and evaluation methodology for video object segmentation. In CVPR, 2016.
  41. Partitioning sparse matrices with eigenvectors of graphs. SIAM journal on matrix analysis and applications, 1990.
  42. Faster r-cnn: Towards real-time object detection with region proposal networks. NIPS, 2015.
  43. Instance-aware, context-focused, and memory-efficient weakly supervised object detection. In CVPR, 2020.
  44. Learning co-segmentation by segment swapping for retrieval and discovery. arXiv preprint arXiv:2110.15904, 2021.
  45. Normalized cuts and image segmentation. TPAMI, 2000.
  46. Hierarchical image saliency detection on extended cssd. TPAMI, 2015.
  47. Unsupervised salient object detection with spectral cluster voting. In CVPRW, 2022.
  48. Viewal: Active learning with viewpoint entropy for semantic segmentation. In CVPR, 2020.
  49. Localizing objects with self-supervised transformers and no labels. In BMVC, 2021.
  50. Looking beyond the image: Unsupervised learning for object saliency and detection. In CVPR, 2013.
  51. Secrets of optical flow estimation and their principles. In CVPR, 2010.
  52. Pwc-net: Cnns for optical flow using pyramid, warping, and cost volume. In CVPR, 2018.
  53. Co-localization in real-world images. In CVPR, 2014.
  54. Raft: Recurrent all-pairs field transforms for optical flow. In ECCV, 2020.
  55. Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training. arXiv, 2022.
  56. Training data-efficient image transformers & distillation through attention. In ICML, 2021.
  57. Selective search for object recognition. IJCV, 2013.
  58. Matrix computations. The Johns Hopkins University Press, 1996.
  59. Attention is all you need. In NeurIPS, 2017.
  60. Object cosegmentation. In CVPR, 2011.
  61. Unsupervised image matching and object discovery as optimization. In CVPR, 2019.
  62. Toward unsupervised, multi-object discovery in large-scale image collections. In ECCV, 2020.
  63. Large-scale unsupervised object discovery. arXiv, 2021.
  64. Object segmentation without labels with large-scale generative models. In ICML, 2021.
  65. Learning to detect salient objects with image-level supervision. In CVPR, 2017.
  66. Unsupervised object discovery and co-localization by deep descriptor transformation. Pattern Recognition, 2019.
  67. Geodesic saliency using background priors. In ECCV, 2012.
  68. Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In CVPRW, 2017.
  69. Dota: A large-scale dataset for object detection in aerial images. In CVPR, 2018.
  70. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
  71. Hierarchical saliency detection. In CVPR, 2013.
  72. Self-supervised video object segmentation by motion grouping. In CVPR, 2021.
  73. Saliency detection via graph-based manifold ranking. In CVPR, 2013.
  74. Dystab: Unsupervised object segmentation via dynamic-static bootstrapping. In CVPR, 2021.
  75. Unsupervised moving object detection via contextual information separation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  76. Deformable sprites for unsupervised video decomposition. CVPR, 2022.
  77. Deep unsupervised saliency detection: A multiple noisy labeling perspective. In CVPR, 2018.
  78. Object discovery from a single unlabeled image by mining frequent itemsets with multi-scale features. TIP, 2020.
  79. Saliency optimization from robust background detection. In CVPR, 2014.
  80. Edge boxes: Locating object proposals from edges. In ECCV, 2014.
Citations (62)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.