Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Distilling Self-Supervised Vision Transformers for Weakly-Supervised Few-Shot Classification & Segmentation (2307.03407v1)

Published 7 Jul 2023 in cs.CV

Abstract: We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with ``mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (104)
  1. Weakly supervised learning of instance segmentation with inter-pixel relations. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  2. Towards understanding ensemble, knowledge distillation and self-distillation in deep learning. Proc. International Conference on Learning Representations (ICLR), 2023.
  3. Deep vit features as dense visual descriptors. arXiv preprint arXiv:2112.05814, 2021.
  4. One weird trick to improve your semi-weakly supervised semantic segmentation model. Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2022.
  5. Weakly supervised deep detection networks. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  6. Few-shot segmentation without meta-learning: A good transductive inference is all you need? In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  7. Deep clustering for unsupervised learning of visual features. In Proc. European Conference on Computer Vision (ECCV), 2018.
  8. Unsupervised learning of visual features by contrasting cluster assignments. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  9. Emerging properties in self-supervised vision transformers. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  10. A simple framework for contrastive learning of visual representations. In Proc. International Conference on Machine Learning (ICML), 2020.
  11. A closer look at few-shot classification. In International Conference on Learning Representations (ICLR), 2019.
  12. Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  13. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2015.
  14. Funnel-transformer: Filtering out sequential redundancy for efficient language processing. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  15. Bert: Pre-training of deep bidirectional transformers for language understanding. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
  16. A baseline for few-shot image classification. In International Conference on Learning Representations, 2019.
  17. Few-shot semantic segmentation with prototype learning. In Proc. British Machine Vision Conference (BMVC), 2018.
  18. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. International Conference on Learning Representations (ICLR), 2021.
  19. Learning a deep convnet for multi-label classification with partial labels. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  20. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 2010.
  21. Pytorch lightning. GitHub. Note: https://github.com/PyTorchLightning/pytorch-lightning, 2019.
  22. Self-support few-shot semantic segmentation. In Proc. European Conference on Computer Vision (ECCV), 2022.
  23. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2006.
  24. Michael Fink. Object classification from a single example utilizing class relevance metrics. Advances in Neural Information Processing Systems (NeurIPS), 2005.
  25. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. International Conference on Machine Learning (ICML), 2017.
  26. Bootstrap your own latent-a new approach to self-supervised learning. Advances in Neural Information Processing Systems (NeurIPS), 2020.
  27. Hypercolumns for object segmentation and fine-grained localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
  28. Masked autoencoders are scalable vision learners. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  29. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  30. Distilling the knowledge in a neural network. NIPS Deep Learning Workshop, 2014.
  31. Learning to learn using gradient descent. In Proc. International Conference on Artificial Neural Networks (ICANN), 2001.
  32. Weakly supervised instance segmentation using the bounding box tightness prior. Advances in Neural Information Processing Systems (NeurIPS), 2019.
  33. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
  34. Few-shot metric learning: Online adaptation of embedding for retrieval. In Asian Conference on Computer Vision (ACCV), 2022.
  35. Integrative few-shot learning for classification and segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  36. Relational embedding for few-shot classification. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  37. Adam: A method for stochastic optimization. Proc. International Conference on Learning Representations (ICLR), 2015.
  38. Siamese neural networks for one-shot image recognition. In International Conference on Machine Learning (ICML) Workshop on Deep Learning, 2015.
  39. Dong-Hyun Lee. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Workshop on challenges in representation learning, ICML, volume 3, page 896, 2013.
  40. Ficklenet: Weakly and semi-supervised semantic image segmentation using stochastic inference. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  41. Anti-adversarially manipulated attributions for weakly and semi-supervised semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  42. A pixel-level meta-learner for weakly supervised few-shot semantic segmentation. In IEEE Winter Conference on Applications of Computer Vision (WACV), 2022.
  43. Weakly supervised object localization with progressive domain adaptation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  44. Tell me where to look: Guided attention inference network. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  45. Scribblesup: Scribble-supervised convolutional networks for semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3159–3167, 2016.
  46. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2117–2125, 2017.
  47. Microsoft coco: Common objects in context. In Proc. European Conference on Computer Vision (ECCV), 2014.
  48. Negative margin matters: Understanding margin in few-shot classification. In Proc. European Conference on Computer Vision (ECCV), 2020.
  49. Learning non-target knowledge for few-shot semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  50. Hypercorrelation squeeze for few-shot segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  51. Rectified linear units improve restricted boltzmann machines. In International Conference on Machine Learning (ICML), 2010.
  52. Feature weighting and boosting for few-shot segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2019.
  53. Scalable vision transformers with hierarchical pooling. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  54. Fully convolutional multi-class multiple instance learning. Proc. International Conference on Learning Representations (ICLR), 2015.
  55. Augmented feedback in semantic segmentation under image level supervision. In Proc. European Conference on Computer Vision (ECCV), 2016.
  56. Optimization as a model for few-shot learning. In Proc. International Conference on Learning Representations (ICLR), 2017.
  57. Learning internal representations by error propagation. Technical report, California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
  58. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 2015.
  59. Improving few-shot part segmentation using coarse supervision. In Proc. European Conference on Computer Vision (ECCV), 2022.
  60. Jürgen Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook. PhD thesis, Technische Universität München, 1987.
  61. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (ICCV), 2017.
  62. One-shot learning for semantic segmentation. In Proc. British Machine Vision Conference (BMVC), 2017.
  63. Distinct class-specific saliency maps for weakly supervised semantic segmentation. In Proc. European Conference on Computer Vision (ECCV), 2016.
  64. Weakly supervised few-shot object segmentation using co-attention with visual and semantic embeddings. In Proc. International Joint Conference on Artificial Intelligence (IJCAI), 2020.
  65. Amp: Adaptive masked proxies for few-shot segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2019.
  66. Localizing objects with self-supervised transformers and no labels. In Proc. British Machine Vision Conference (BMVC), 2021.
  67. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  68. Revisiting the sibling head in object detector. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  69. Segmenter: Transformer for semantic segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  70. Learning to compare: Relation network for few-shot learning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  71. Prior guided feature enrichment network for few-shot segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020.
  72. Training data-efficient image transformers & distillation through attention. In Proc. International Conference on Machine Learning (ICML). PMLR, 2021.
  73. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  74. Extracting and composing robust features with denoising autoencoders. In Proc. International Conference on Machine Learning (ICML), 2008.
  75. Matching networks for one shot learning. In Advances in Neural Information Processing Systems (NeurIPS), 2016.
  76. Panet: Few-shot image semantic segmentation with prototype alignment. In Proc. IEEE International Conference on Computer Vision (ICCV), 2019.
  77. Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In Asian Conference on Computer Vision (ACCV), 2022.
  78. Uncertainty-dtw for time series and sequences. In Proc. European Conference on Computer Vision (ECCV), 2022.
  79. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  80. Semi-and weakly-supervised semantic segmentation with deep convolutional neural networks. In Proc. ACM Multimedia Conference (ACMMM), 2015.
  81. Self-supervised transformers for unsupervised object discovery using normalized cut. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  82. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur), 2020.
  83. Group normalization. In Proc. European Conference on Computer Vision (ECCV), 2018.
  84. Learning meta-class memory for few-shot semantic segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  85. Unsupervised feature learning via non-parametric instance discrimination. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
  86. Few-shot semantic segmentation with cyclic memory network. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  87. Doubly deformable aggregation of covariance matrices for few-shot segmentation. In Proc. European Conference on Computer Vision (ECCV), 2022.
  88. Prototype mixture models for few-shot semantic segmentation. In Proc. European Conference on Computer Vision (ECCV), 2020.
  89. Mining latent classes for few-shot segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2021.
  90. Few-shot learning via embedding adaptation with set-to-set functions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
  91. Transfgu: a top-down approach to fine-grained unsupervised semantic segmentation. In Proc. European Conference on Computer Vision (ECCV), 2022.
  92. Unsupervised semantic segmentation with self-supervised object-centric representations. Proc. International Conference on Learning Representations (ICLR), 2023.
  93. Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation. In Proc. IEEE International Conference on Computer Vision (ICCV), 2019.
  94. Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  95. Few-shot segmentation via cycle-consistent transformer. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  96. Self-distillation: Towards efficient and compact neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
  97. Time-reversed diffusion tensor transformer: A new tenet of few-shot object detection. In Proc. European Conference on Computer Vision (ECCV), 2022.
  98. Kernelized few-shot object detection with efficient integral aggregation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  99. Self-distillation as instance-specific label smoothing. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  100. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  101. Improving semantic segmentation via efficient self-training. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
  102. Local aggregation for unsupervised learning of visual embeddings. In Proc. IEEE International Conference on Computer Vision (ICCV), 2019.
  103. Self-supervised learning of object parts for semantic segmentation. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  104. Pseudoseg: Designing pseudo labels for semantic segmentation. Proc. International Conference on Learning Representations (ICLR), 2021.
Citations (21)

Summary

We haven't generated a summary for this paper yet.