Exploring Weight Balancing on Long-Tailed Recognition Problem (2305.16573v7)
Abstract: Recognition problems in long-tailed data, in which the sample size per class is heavily skewed, have gained importance because the distribution of the sample size per class in a dataset is generally exponential unless the sample size is intentionally adjusted. Various methods have been devised to address these problems.Recently, weight balancing, which combines well-known classical regularization techniques with two-stage training, has been proposed. Despite its simplicity, it is known for its high performance compared with existing methods devised in various ways. However, there is a lack of understanding as to why this method is effective for long-tailed data. In this study, we analyze weight balancing by focusing on neural collapse and the cone effect at each training stage and found that it can be decomposed into an increase in Fisher's discriminant ratio of the feature extractor caused by weight decay and cross entropy loss and implicit logit adjustment caused by weight decay and class-balanced loss. Our analysis enables the training method to be further simplified by reducing the number of training stages to one while increasing accuracy. Code is available at https://github.com/HN410/Exploring-Weight-Balancing-on-Long-Tailed-Recognition-Problem.
- Long- Tailed Recognition via Weight Balancing. c2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6887–6897, March 2022. ISSN 10636919. doi: 10.1109/cvpr52688.2022.00677. URL https://arxiv.org/abs/2203.14197. arXiv: 2203.14197 ISBN: 9781665469463.
- Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/621461af90cadfdaf0e8d4cc25129f91-Abstract.html.
- Feature Space Augmentation for Long-Tailed Data. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, Lecture Notes in Computer Science, pages 694–710, Cham, 2020. Springer International Publishing. ISBN 978-3-030-58526-6. doi: 10.1007/978-3-030-58526-6˙41.
- Class-balanced loss based on effective number of samples. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:9260–9269, January 2019. ISSN 10636919. doi: 10.1109/CVPR.2019.00949. URL http://arxiv.org/abs/1901.05555. arXiv: 1901.05555 ISBN: 9781728132938.
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 248–255. IEEE, June 2009. ISBN 978-1-4244-3992-8. doi: 10.1109/CVPR.2009.5206848.
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations, January 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
- Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training. Proceedings of the National Academy of Sciences of the United States of America, 118(43), January 2021. ISSN 10916490. doi: 10.1073/pnas.2103091118. URL https://arxiv.org/abs/2101.12699. arXiv: 2101.12699.
- R. A. Fisher. The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7(2):179–188, 1936. doi: 10.1111/j.1469-1809.1936.tb02137.x.
- SGD and Weight Decay Provably Induce a Low-Rank Bias in Neural Networks, January 2023. URL http://arxiv.org/abs/2206.05794. arXiv:2206.05794 [cs, stat].
- Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/hash/87784eca6b0dea1dff92478fb786b401-Abstract.html.
- Analysis of the AutoML Challenge Series 2015–2018. In Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren, editors, Automated Machine Learning: Methods, Systems, Challenges, The Springer Series on Challenges in Machine Learning, pages 177–219. Springer International Publishing, Cham, 2019. ISBN 978-3-030-05318-5. doi: 10.1007/978-3-030-05318-5˙10. URL https://doi.org/10.1007/978-3-030-05318-5_10.
- Dimensionality Reduction by Learning an Invariant Mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 2, pages 1735–1742, June 2006. doi: 10.1109/CVPR.2006.100. ISSN: 1063-6919.
- Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path. In International Conference on Learning Representations, January 2022. URL https://openreview.net/forum?id=w1UbdvWH_R3.
- Comparing Biases for Minimal Network Construction with Back-Propagation. Advances in neural information processing systems 1, 1:177–185, 1989. URL http://portal.acm.org/citation.cfm?id=89851.89872. Publisher: Morgan-Kaufmann ISBN: 1-558-60015-9.
- Deep residual learning for image recognition. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2016-Decem, pages 770–778, December 2016. ISBN 978-1-4673-8850-4. doi: 10.1109/CVPR.2016.90. URL http://arxiv.org/abs/1512.03385. arXiv: 1512.03385 ISSN: 10636919.
- Geoffrey E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40(1):185–234, September 1989. ISSN 0004-3702. doi: 10.1016/0004-3702(89)90049-0. URL https://www.sciencedirect.com/science/article/pii/0004370289900490.
- Improving neural networks by preventing co-adaptation of feature detectors. July 2012. doi: 10.48550/arxiv.1207.0580. URL http://arxiv.org/abs/1207.0580. arXiv: 1207.0580.
- Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. pmlr, 2015.
- An Unconstrained Layer-Peeled Perspective on Neural Collapse. In International Conference on Learning Representations, January 2022. URL https://openreview.net/forum?id=WZ3yjh8coDg.
- Well-tuned Simple Nets Excel on Tabular Datasets. In Advances in Neural Information Processing Systems, November 2021. URL https://openreview.net/forum?id=d3k38LTDCyO.
- Decoupling Representation and Classifier for Long-Tailed Recognition. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=r1gRTCVFvB.
- Exploring Balanced Feature Spaces for Representation Learning. In International Conference on Learning Representations, March 2023. URL https://openreview.net/forum?id=OqtLIabPTit.
- Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks, May 2022. URL http://arxiv.org/abs/2205.07260. arXiv:2205.07260 [cs].
- Adjusting decision boundary for class imbalanced learning. IEEE Access, 8:81674–81685, December 2020. ISSN 21693536. doi: 10.1109/ACCESS.2020.2991231. URL https://arxiv.org/abs/1912.01857. arXiv: 1912.01857.
- Alex Krizhevsky. Learning Multiple Layers of Features from Tiny Images. Science Department, University of Toronto, Tech., pages 1–60, 2009. ISSN 1098-6596. doi: 10.1.1.222.9220. arXiv: 1011.1669v3 ISBN: 9788578110796.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, January 1998. ISSN 1558-2256. doi: 10.1109/5.726791. Conference Name: Proceedings of the IEEE.
- Visualizing the Loss Landscape of Neural Nets. In Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper_files/paper/2018/hash/a41b3bb3e6b050b6c9067c67f663b915-Abstract.html.
- Targeted Supervised Contrastive Learning for Long-Tailed Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6918–6928, 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Li_Targeted_Supervised_Contrastive_Learning_for_Long-Tailed_Recognition_CVPR_2022_paper.html.
- WebVision Database: Visual Learning and Understanding from Web Data, August 2017. URL http://arxiv.org/abs/1708.02862. arXiv:1708.02862 [cs].
- An Exponential Learning Rate Schedule for Deep Learning. In International Conference on Learning Representations, March 2020. URL https://openreview.net/forum?id=rJg8TeSFDH.
- Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning. In Advances in Neural Information Processing Systems, 2022. URL http://arxiv.org/abs/2203.02053. arXiv: 2203.02053.
- Deep representation learning on long-tailed data: A learnable embedding augmentation perspective. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 2967–2976, February 2020. doi: 10.1109/CVPR42600.2020.00304. URL https://arxiv.org/abs/2002.10826. arXiv: 2002.10826 ISSN: 10636919.
- Inducing Neural Collapse in Deep Long-tailed Learning. In International Conference on Artificial Intelligence and Statistics, pages 11534–11544. PMLR, 2023.
- Large-Scale Long-Tailed Recognition in an Open World. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), April 2019. doi: 10.48550/arxiv.1904.05160. URL https://arxiv.org/abs/1904.05160. arXiv: 1904.05160.
- On the periodic behavior of neural network training with batch normalization and weight decay. Advances in Neural Information Processing Systems, 34:21545–21556, 2021.
- Retrieval Augmented Classification for Long-Tail Visual Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6959–6969, 2022. URL https://openaccess.thecvf.com/content/CVPR2022/html/Long_Retrieval_Augmented_Classification_for_Long-Tail_Visual_Recognition_CVPR_2022_paper.html.
- SGDR: Stochastic gradient descent with warm restarts. In 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, August 2017. doi: 10.48550/arxiv.1608.03983. URL https://arxiv.org/abs/1608.03983. arXiv: 1608.03983.
- Decoupled Weight Decay Regularization. September 2018. URL https://openreview.net/forum?id=Bkg6RiCqY7.
- Neural Collapse with Cross-Entropy Loss, January 2021. URL http://arxiv.org/abs/2012.08465. arXiv:2012.08465 [cs, math].
- Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction. In Advances in Neural Information Processing Systems, October 2022. URL https://openreview.net/forum?id=xp5VOBxTxZ.
- A Simple Long-Tailed Recognition Baseline via Vision-Language Model, November 2021. URL https://arxiv.org/abs/2111.14745v1.
- Delving into Semantic Scale Imbalance. In The Eleventh International Conference on Learning Representations, September 2022. URL https://openreview.net/forum?id=07tc5kKRIo.
- On implicit filter level sparsity in convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 520–528, 2019.
- Long-tail learning via logit adjustment. International Conference on Learning Representations, July 2020. doi: 10.48550/arxiv.2007.07314. URL http://arxiv.org/abs/2007.07314. arXiv: 2007.07314.
- Prevalence of neural collapse during the terminal phase of deep learning training. Proceedings of the National Academy of Sciences of the United States of America, 117(40):24652–24663, August 2020. ISSN 10916490. doi: 10.1073/pnas.2015509117. URL http://arxiv.org/abs/2008.08186. arXiv: 2008.08186.
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, July 2021. URL https://proceedings.mlr.press/v139/radford21a.html. ISSN: 2640-3498.
- Neural Collapse in Deep Homogeneous Classifiers and The Role of Weight Decay. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, volume 2022-May, pages 4243–4247, 2022. ISBN 978-1-66540-540-9. doi: 10.1109/ICASSP43922.2022.9746778. ISSN: 15206149.
- William J. Reed. The Pareto, Zipf and other power laws. Economics Letters, 74(1):15–19, December 2001. ISSN 01651765. doi: 10.1016/S0165-1765(01)00524-9. Publisher: North-Holland.
- Measuring and Predicting Importance of Objects in Our Visual World, November 2007. URL https://resolver.caltech.edu/CaltechAUTHORS:CNS-TR-2007-002. Num Pages: 8 Place: Pasadena, CA Publisher: California Institute of Technology.
- Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research, 15(56):1929–1958, 2014. ISSN 1533-7928. URL http://jmlr.org/papers/v15/srivastava14a.html.
- Grassmannian frames with applications to coding and communication. Applied and Computational Harmonic Analysis, 14(3):257–275, 2003. ISSN 10635203. doi: 10.1016/S1063-5203(03)00023-X. URL https://www.sciencedirect.com/science/article/pii/S106352030300023X. arXiv: math/0301135.
- Four Things Everyone Should Know to Improve Batch Normalization. In International Conference on Learning Representations, March 2020. URL https://openreview.net/forum?id=HJx8HANFDH.
- Imbalance Trouble: Revisiting Neural-Collapse Geometry. Advances in Neural Information Processing Systems, August 2022. doi: 10.48550/arxiv.2208.05512. URL http://arxiv.org/abs/2208.05512. arXiv: 2208.05512.
- VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition. In Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner, editors, Computer Vision – ECCV 2022, Lecture Notes in Computer Science, pages 73–91, Cham, 2022. Springer Nature Switzerland. ISBN 978-3-031-19806-9. doi: 10.1007/978-3-031-19806-9˙5.
- An empirical study of example forgetting during deep neural network learning. 7th International Conference on Learning Representations, ICLR 2019, December 2019. doi: 10.48550/arxiv.1812.05159. URL http://arxiv.org/abs/1812.05159. arXiv: 1812.05159.
- Feature generation for long-tail classification. In Proceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP ’21, pages 1–9, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 978-1-4503-7596-2. doi: 10.1145/3490035.3490300. URL https://dl.acm.org/doi/10.1145/3490035.3490300.
- Matching Networks for One Shot Learning. In Advances in Neural Information Processing Systems, volume 29. Curran Associates, Inc., 2016. URL https://papers.nips.cc/paper_files/paper/2016/hash/90e1357833654983612fb05e3ec9148c-Abstract.html.
- Spherical Motion Dynamics: Learning Dynamics of Normalized Neural Network using SGD and Weight Decay. In Advances in Neural Information Processing Systems, volume 34, pages 6380–6391. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/hash/326a8c055c0d04f5b06544665d8bb3ea-Abstract.html.
- RSG: A Simple but Effective Module for Learning Imbalanced Datasets. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3783–3792, June 2021. doi: 10.1109/CVPR46437.2021.00378. ISSN: 2575-7075.
- Aggregated Residual Transformations for Deep Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1492–1500, 2017. URL https://openaccess.thecvf.com/content_cvpr_2017/html/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.html.
- Inducing Neural Collapse in Imbalanced Learning: Do We Really Need a Learnable Classifier at the End of Deep Neural Network? In Advances in Neural Information Processing Systems, October 2022. URL https://openreview.net/forum?id=A6EmxI3_Xc.
- Three mechanisms of weight decay regularization. 7th International Conference on Learning Representations, ICLR 2019, October 2019. doi: 10.48550/arxiv.1810.12281. URL http://arxiv.org/abs/1810.12281. arXiv: 1810.12281.
- Deep Long-Tailed Learning: A Survey. October 2021. doi: 10.48550/arxiv.2110.04596. URL http://arxiv.org/abs/2110.04596. arXiv: 2110.04596.
- Naoya Hasegawa (4 papers)
- Issei Sato (82 papers)