Papers
Topics
Authors
Recent
2000 character limit reached

DINO as a von Mises-Fisher mixture model (2405.10939v1)

Published 17 May 2024 in cs.LG, cs.AI, and cs.CV

Abstract: Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are $L2$-normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also $L2$-normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Self-labelling via simultaneous clustering and representation learning. In ICLR, 2020.
  2. Semi-supervised learning of visual features by non-parametrically predicting view assignments with support samples. In ICCV, 2021.
  3. Masked siamese networks for label-efficient learning. In ECCV, 2022.
  4. Clustering on the unit hypersphere using von Mises-Fisher distributions. JMLR, 2005.
  5. Beit: Bert pre-training of image transformers. In ICLR, 2021.
  6. Sparse mixture of von Mises-Fisher distribution. In ESANN, 2021.
  7. Food-101 – mining discriminative components with random forests. In ECCV, 2014.
  8. Signature verification using a" siamese" time delay neural network. NeurIPS, 1993.
  9. Deep clustering for unsupervised learning of visual features. In ECCV, 2018.
  10. Unsupervised pre-training of image features on non-curated data. In ICCV, 2019.
  11. Unsupervised learning of visual features by contrasting cluster assignments. NeurIPS, 2020.
  12. Emerging properties in self-supervised vision transformers. In ICCV, 2021.
  13. A stochastic approximation type em algorithm for the mixture problem. Stochastics: An International Journal of Probability and Stochastic Processes, 1992.
  14. A simple framework for contrastive learning of visual representations. In ICML, 2020a.
  15. Exploring simple siamese representation learning. In CVPR, 2021.
  16. Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297, 2020b.
  17. An empirical study of training self-supervised vision transformers. In ICCV, 2021.
  18. Describing textures in the wild. In CVPR, 2014.
  19. Convergence of a stochastic approximation version of the em algorithm. Annals of statistics, 1999.
  20. Imagenet: A large-scale hierarchical image database. In CVPR, 2009.
  21. DLMF. NIST Digital Library of Mathematical Functions. http://dlmf.nist.gov/, Release 1.1.6 of 2022-06-30, 2022. URL http://dlmf.nist.gov/. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
  22. Unsupervised visual representation learning by context prediction. In ICCV, 2015.
  23. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2020.
  24. How well do self-supervised models transfer? In CVPR, 2021.
  25. Whitening for self-supervised representation learning. In ICML, 2021.
  26. von Mises-Fisher clustering models. In ICML, 2014.
  27. Bootstrap your own latent-a new approach to self-supervised learning. NeurIPS, 2020.
  28. von Mises-Fisher mixture model-based deep learning: Application to face verification. arXiv preprint arXiv:1706.04264, 2017.
  29. Deep residual learning for image recognition. In CVPR, 2016.
  30. Momentum contrast for unsupervised visual representation learning. In CVPR, 2020.
  31. Masked autoencoders are scalable vision learners. In CVPR, 2022.
  32. Segsort: Segmentation by discriminative sorting of segments. In ICCV, 2019.
  33. Space-time correspondence as a contrastive random walk. NeurIPS, 2020.
  34. Learning image representations by completing damaged jigsaw puzzles. In WACV, 2018.
  35. Unsupervised representation learning by predicting image rotations. In ICLR, 2018.
  36. Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
  37. Learning representations for automatic colorization. In ECCV, 2016.
  38. Compressive visual representations. NeurIPS, 2021.
  39. Efficient self-supervised vision transformers for representation learning. In ICLR, 2021.
  40. Caltech 101, 2022. URL https://data.caltech.edu/records/20086.
  41. Prototypical contrastive learning of unsupervised representations. In ICLR, 2020.
  42. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV, 2021.
  43. Fixing weight decay regularization in adam. 2018.
  44. Fine-grained visual classification of aircraft. Technical report, 2013.
  45. Ishan Misra and Laurens van der Maaten. Self-supervised learning of pretext-invariant representations. In CVPR, 2020.
  46. Automated flower classification over a large number of classes. In 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing, 2008.
  47. Unsupervised learning of visual representations by solving jigsaw puzzles. In ECCV, 2016.
  48. Cats and dogs. In CVPR, 2012.
  49. Beit v2: Masked image modeling with vector-quantized visual tokenizers. arXiv preprint arXiv:2208.06366, 2022.
  50. Object retrieval with large vocabularies and fast spatial matching. In CVPR, 2007.
  51. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, 2008.
  52. The 2017 davis challenge on video object segmentation. arXiv:1704.00675, 2017.
  53. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. NeurIPS, 2016.
  54. What do we maximize in self-supervised learning? In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML, 2022.
  55. Bayesian estimation of the von-Mises Fisher mixture model with variational inference. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014.
  56. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. NeurIPS, 2017.
  57. Contrastive multiview coding. In ECCV, 2020.
  58. Training data-efficient image transformers & distillation through attention. In ICML, 2021.
  59. Deit iii: Revenge of the vit. In ECCV, 2022.
  60. Attention is all you need. NeurIPS, 2017.
  61. Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In ICML, 2020.
  62. Unsupervised feature learning via non-parametric instance discrimination. In CVPR, 2018.
  63. Sun database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
  64. Self-supervised learning with swin transformers. arXiv preprint arXiv:2105.04553, 2021.
  65. Prototype mixture models for few-shot semantic segmentation. In ECCV, 2020.
  66. Barlow twins: Self-supervised learning via redundancy reduction. In ICML, 2021.
  67. Colorful image colorization. In ECCV, 2016.
  68. Image bert pre-training with online tokenizer. In ICLR, 2021.
  69. Local aggregation for unsupervised learning of visual embeddings. In ICCV, 2019.
Citations (10)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 3 tweets with 0 likes about this paper.