Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains (2402.18614v1)

Published 28 Feb 2024 in cs.LG, cs.CV, and cs.NE

Abstract: The recently discovered Neural collapse (NC) phenomenon states that the last-layer weights of Deep Neural Networks (DNN), converge to the so-called Equiangular Tight Frame (ETF) simplex, at the terminal phase of their training. This ETF geometry is equivalent to vanishing within-class variability of the last layer activations. Inspired by NC properties, we explore in this paper the transferability of DNN models trained with their last layer weight fixed according to ETF. This enforces class separation by eliminating class covariance information, effectively providing implicit regularization. We show that DNN models trained with such a fixed classifier significantly improve transfer performance, particularly on out-of-domain datasets. On a broad range of fine-grained image classification datasets, our approach outperforms i) baseline methods that do not perform any covariance regularization (up to 22%), as well as ii) methods that explicitly whiten covariance of activations throughout training (up to 19%). Our findings suggest that DNNs trained with fixed ETF classifiers offer a powerful mechanism for improving transfer learning across domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (22)
  1. “Neural collapse with cross-entropy loss,” arXiv:2012.08465, 2020.
  2. “Inducing neural collapse in imbalanced learning: Do we really need a learnable classifier at the end of deep neural network,” NeurIPS, vol. 35, pp. 37991–38002, 2022.
  3. “Neural collapse inspired feature-classifier alignment for few-shot class incremental learning,” arXiv:2302.03004, 2023.
  4. “Prevalence of neural collapse during the terminal phase of deep learning training,” PNAS, vol. 117, no. 40, pp. 24652–24663, 2020.
  5. “A study of neural collapse phenomenon: Grassmannian frame, symmetry, generalization,” arXiv:2304.08914, 2023.
  6. “Switchable whitening for deep representation learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1863–1871.
  7. “Neural collapse under mse loss: Proximity to and dynamics on the central path,” arXiv:2106.02073, 2021.
  8. “A geometric analysis of neural collapse with unconstrained features,” Advances in Neural Information Processing Systems, vol. 34, pp. 29820–29834, 2021.
  9. “Random features for large-scale kernel machines,” Advances in neural information processing systems, vol. 20, 2007.
  10. “Random matrices in service of ml footprint: ternary random features with no performance loss,” arXiv:2110.01899, 2021.
  11. “Random matrix theory proves that deep learning representations of gan-data behave as gaussian mixtures,” in ICML. PMLR, 2020, pp. 8573–8582.
  12. “Random matrix-improved kernels for large dimensional spectral clustering,” in SSP, 2018, pp. 453–457.
  13. “On inner-product kernels of high dimensional data,” in CAMSAP. IEEE, 2019, pp. 579–583.
  14. “Kernel spectral clustering of large dimensional data,” Electronic Journal of Statistics, vol. 10, pp. 1393–1454, 2016.
  15. “ImageNet: A Large-Scale Hierarchical Image Database,” in CVPR, 2009.
  16. “An analysis of single-layer networks in unsupervised feature learning,” in AISTATS, 2011, pp. 215–223.
  17. “Learning multiple layers of features from tiny images,” 2009.
  18. “Cats and dogs,” in CVPR. IEEE, 2012, pp. 3498–3505.
  19. “A visual vocabulary for flower classification,” in CVPR. IEEE, 2006, vol. 2, pp. 1447–1454.
  20. “Describing textures in the wild,” in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2014.
  21. “Reading digits in natural images with unsupervised feature learning,” 2011.
  22. “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hafiz Tiomoko Ali (8 papers)
  2. Umberto Michieli (40 papers)
  3. Ji Joong Moon (3 papers)
  4. Daehyun Kim (16 papers)
  5. Mete Ozay (65 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.