MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck (2403.19078v1)
Abstract: Self-supervised learning aims to learn representation that can be effectively generalized to downstream tasks. Many self-supervised approaches regard two views of an image as both the input and the self-supervised signals, assuming that either view contains the same task-relevant information and the shared information is (approximately) sufficient for predicting downstream tasks. Recent studies show that discarding superfluous information not shared between the views can improve generalization. Hence, the ideal representation is sufficient for downstream tasks and contains minimal superfluous information, termed minimal sufficient representation. One can learn this representation by maximizing the mutual information between the representation and the supervised view while eliminating superfluous information. Nevertheless, the computation of mutual information is notoriously intractable. In this work, we propose an objective termed multi-view entropy bottleneck (MVEB) to learn minimal sufficient representation effectively. MVEB simplifies the minimal sufficient learning to maximizing both the agreement between the embeddings of two views and the differential entropy of the embedding distribution. Our experiments confirm that MVEB significantly improves performance. For example, it achieves top-1 accuracy of 76.9\% on ImageNet with a vanilla ResNet-50 backbone on linear evaluation. To the best of our knowledge, this is the new state-of-the-art result with ResNet-50.
- Z. Wu, Y. Xiong, S. X. Yu, and D. Lin, “Unsupervised feature learning via non-parametric instance discrimination,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018.
- T. Chen, S. Kornblith, M. Norouzi, and G. E. Hinton, “A simple framework for contrastive learning of visual representations,” in Proc. Int. Conf. Mach. Learn., 2020.
- K. He, H. Fan, Y. Wu, S. Xie, and R. B. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020.
- X. Wang and G.-J. Qi, “Contrastive learning with stronger augmentations,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 5549–5560, 2023.
- Y. Wang, J. Lin, Q. Cai, Y. Pan, T. Yao, H. Chao, and T. Mei, “A low rank promoting prior for unsupervised contrastive learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 3, pp. 2667–2681, 2023.
- J. Grill, F. Strub, F. Altché, C. Tallec, P. H. Richemond, E. Buchatskaya, C. Doersch, B. Á. Pires, Z. Guo, M. G. Azar, B. Piot, K. Kavukcuoglu, R. Munos, and M. Valko, “Bootstrap your own latent - A new approach to self-supervised learning,” in Proc. Adv. Neural Inform. Process. Syst., 2020.
- X. Chen and K. He, “Exploring simple siamese representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021.
- M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021.
- J. Zbontar, L. Jing, I. Misra, Y. LeCun, and S. Deny, “Barlow twins: Self-supervised learning via redundancy reduction,” in Proc. Int. Conf. Mach. Learn., 2021.
- A. Bardes, J. Ponce, and Y. LeCun, “Vicreg: Variance-invariance-covariance regularization for self-supervised learning,” in Proc. Int. Conf. Learn. Representations, 2022.
- K. Sridharan and S. M. Kakade, “An information theoretic framework for multi-view learning,” in Annual Conference on Learning Theory, 2008.
- M. Federici, A. Dutta, P. Forré, N. Kushman, and Z. Akata, “Learning robust representations via multi-view information bottleneck,” in Proc. Int. Conf. Learn. Representations, 2020.
- Y. H. Tsai, Y. Wu, R. Salakhutdinov, and L. Morency, “Self-supervised learning from a multi-view perspective,” in Proc. Int. Conf. Learn. Representations, 2021.
- N. Tishby, F. C. Pereira, and W. Bialek, “The information bottleneck method,” arXiv preprint physics/0004057, 2000.
- H. Wang, X. Guo, Z.-H. Deng, and Y. Lu, “Rethinking minimal sufficient representation in contrastive learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022.
- G. L. Gaile and J. E. Burt, “Directional statistics (2nd edition).” 2020.
- A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. A. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 9, pp. 1734–1747, 2016.
- M. Caron, P. Bojanowski, A. Joulin, and M. Douze, “Deep clustering for unsupervised learning of visual features,” in Proc. Eur. Conf. Comput. Vis., 2018.
- M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” in Proc. Adv. Neural Inform. Process. Syst., 2020.
- T. Wang and P. Isola, “Understanding contrastive representation learning through alignment and uniformity on the hypersphere,” in Proc. Int. Conf. Mach. Learn., 2020.
- S. Zhang, F. Zhu, J. Yan, R. Zhao, and X. Yang, “Zero-cl: Instance and feature decorrelation for negative-free symmetric contrastive learning,” in Proc. Int. Conf. Learn. Representations, 2022.
- A. Ermolov, A. Siarohin, E. Sangineto, and N. Sebe, “Whitening for self-supervised representation learning,” in Proc. Int. Conf. Mach. Learn., 2021.
- A. Achille and S. Soatto, “Emergence of invariance and disentanglement in deep representations,” The Journal of Machine Learning Research, vol. 19, no. 1, pp. 1947–1980, 2018.
- A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” Adv. Neural Inform. Process. Syst., 2018.
- M. I. Belghazi, A. Baratin, S. Rajeswar, S. Ozair, Y. Bengio, A. Courville, and R. D. Hjelm, “Mine: mutual information neural estimation,” Proc. Int. Conf. Mach. Learn., 2018.
- L. Wen, Y. Zhou, L. He, M. Zhou, and Z. Xu, “Mutual information gradient estimation for representation learning,” in Proc. Int. Conf. Learn. Representations, 2020.
- G. Roeder, Y. Wu, and D. K. Duvenaud, “Sticking the landing: Simple, lower-variance gradient estimators for variational inference,” in Proc. Adv. Neural Inform. Process. Syst., 2017.
- Y. Li and R. E. Turner, “Gradient estimators for implicit models,” in Proc. Int. Conf. Learn. Representations, 2018.
- C. Tao, H. Wang, X. Zhu, J. Dong, S. Song, G. Huang, and J. Dai, “Exploring the equivalence of siamese self-supervised learning via a unified gradient framework,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2022.
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009.
- Y. You, I. Gitman, and B. Ginsburg, “Scaling sgd batch size to 32k for imagenet training,” arXiv preprint arXiv:1708.03888, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016.
- X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
- Y. Tian, C. Sun, B. Poole, D. Krishnan, C. Schmid, and P. Isola, “What makes for good views for contrastive learning?” in Proc. Adv. Neural Inform. Process. Syst., 2020.
- E. Amrani and A. Bronstein, “Self-supervised classification network,” in Proc. Eur. Conf. Comput. Vis., 2022.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. B. Girshick, “Masked autoencoders are scalable vision learners,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 15 979–15 988.
- Z. Xie, Z. Zhang, Y. Cao, Y. Lin, J. Bao, Z. Yao, Q. Dai, and H. Hu, “Simmim: a simple framework for masked image modeling,” in IEEE Conf. Comput. Vis. Pattern Recog., 2022, pp. 9643–9653.
- R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,” in Proc. Eur. Conf. Comput. Vis., 2016.
- X. Zhai, A. Oliver, A. Kolesnikov, and L. Beyer, “S4l: Self-supervised semi-supervised learning,” in Proc. Int. Conf. Comput. Vis., 2019.
- S. Maji, E. Rahtu, J. Kannala, M. Blaschko, and A. Vedaldi, “Fine-grained visual classification of aircraft,” arXiv preprint arXiv:1306.5151, 2013.
- L. Fei-Fei, R. Fergus, and P. Perona, “Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Worksh., 2004.
- J. Krause, J. Deng, M. Stark, and L. Fei-Fei, “Collecting a large-scale dataset of fine-grained cars,” in Proc. Second Workshop on Fine-Grained Visual Categorization, 2013.
- A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009. [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
- M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014.
- M.-E. Nilsback and A. Zisserman, “Automated flower classification over a large number of classes,” in Proc. Indian Conference on Computer Vision, Graphics & Image Processing, 2008.
- L. Bossard, M. Guillaumin, and L. V. Gool, “Food-101–mining discriminative components with random forests,” in Proc. Eur. Conf. Comput. Vis., 2014.
- O. M. Parkhi, A. Vedaldi, A. Zisserman, and C. Jawahar, “Cats and dogs,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2012.
- J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba, “Sun database: Large-scale scene recognition from abbey to zoo,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2010.
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The pascal visual object classes (voc) challenge,” Int. J. Comput. Vis., vol. 88, pp. 303–338, 2010.
- Y. Xiao, Z. Wei, and Z. Wang, “A limited memory bfgs-type method for large-scale unconstrained optimization,” Computers & Mathematics with Applications, vol. 56, no. 4, pp. 1001–1009, 2008.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in Proc. Adv. Neural Inform. Process. Syst., 2019.
- T. Chen, S. Kornblith, K. Swersky, M. Norouzi, and G. E. Hinton, “Big self-supervised models are strong semi-supervised learners,” in Proc. Adv. Neural Inform. Process. Syst., 2020.
- X. Wang, R. Zhang, C. Shen, T. Kong, and L. Li, “Dense contrastive learning for self-supervised visual pre-training,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021.
- T. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014.
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017.
- Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, and R. Girshick, “Detectron2,” https://github.com/facebookresearch/detectron2, 2019.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020.
- X. Chen, S. Xie, and K. He, “An empirical study of training self-supervised vision transformers,” in Int. Conf. Comput. Vis., 2021.