Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

United We Stand: Using Epoch-wise Agreement of Ensembles to Combat Overfit (2310.11077v2)

Published 17 Oct 2023 in cs.LG and cs.CV

Abstract: Deep neural networks have become the method of choice for solving many classification tasks, largely because they can fit very complex functions defined over raw data. The downside of such powerful learners is the danger of overfit. In this paper, we introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting by combining models generated at specific intermediate epochs during training. Our method allows for the incorporation of useful knowledge obtained by the models during the overfitting phase without deterioration of the general performance, which is usually missed when early stopping is used. To motivate this approach, we begin with the theoretical analysis of a regression model, whose prediction -- that the variance among classifiers increases when overfit occurs -- is demonstrated empirically in deep networks in common use. Guided by these results, we construct a new ensemble-based prediction method, where the prediction is determined by the class that attains the most consensual prediction throughout the training epochs. Using multiple image and text classification datasets, we show that when regular ensembles suffer from overfit, our method eliminates the harmful reduction in generalization due to overfit, and often even surpasses the performance obtained by early stopping. Our method is easy to implement and can be integrated with any training scheme and architecture, without additional prior knowledge beyond the training set. It is thus a practical and useful tool to overcome overfit. Code is available at https://github.com/uristern123/United-We-Stand-Using-Epoch-wise-Agreement-of-Ensembles-to-Combat-Overfit.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. A closer look at memorization in deep networks. In International conference on machine learning, 233–242. PMLR.
  2. From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model. In International Conference on Machine Learning, 1277–1297. PMLR.
  3. Breiman, L. 1996. Bagging predictors. Machine learning, 24(2): 123–140.
  4. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, 248–255. Ieee.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  6. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1): 119–139.
  7. Ensemble deep learning: A review. arXiv preprint arXiv:2104.02395.
  8. Let’s Agree to Agree: Neural Networks Share Classification Order on Real Datasets. In Int. Conf. Machine Learning ICML, 3950–3960.
  9. Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10): 993–1001.
  10. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, 448–456. PMLR.
  11. Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis. Medical Image Analysis, 65: 101759.
  12. Learning multiple layers of features from tiny images. Online.
  13. A simple weight decay can improve generalization. Advances in neural information processing systems, 4.
  14. Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30.
  15. Tiny imagenet visual recognition challenge. CS 231N, 7(7): 3.
  16. Robust inference via generative classifiers for handling noisy labels. In International Conference on Machine Learning, 3763–3772. PMLR.
  17. Research and development of neural network ensembles: a survey. Artificial Intelligence Review, 49(4): 455–479.
  18. Webvision database: Visual learning and understanding from web data. arXiv preprint arXiv:1708.02862.
  19. Early-learning regularization prevents memorization of noisy labels. Advances in neural information processing systems, 33: 20331–20342.
  20. Deep double descent: Where bigger models and more data hurt. Journal of Statistical Mechanics: Theory and Experiment, 2021(12): 124003.
  21. Label noise correction and application in crowdsourcing. Expert Systems with Applications, 66: 149–162.
  22. Making deep neural networks robust to label noise: A loss correction approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1944–1952.
  23. Identifying mislabeled data using the area under the margin ranking. Advances in Neural Information Processing Systems, 33: 17044–17056.
  24. When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics. arXiv preprint arXiv:2105.08997.
  25. Overfitting mechanism and avoidance in deep neural networks. arXiv preprint arXiv:1901.06566.
  26. Sanh et al, . 2021. Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv:2110.08207.
  27. A survey on image data augmentation for deep learning. Journal of big data, 6(1): 1–48.
  28. The Dynamic of Consensus in Deep Networks and the Identification of Noisy Labels. arXiv preprint arXiv:2210.00583.
  29. SELFIE: Refurbishing Unclean Samples for Robust Deep Learning. In ICML.
  30. Learning from noisy labels with deep neural networks: A survey. IEEE Transactions on Neural Networks and Learning Systems.
  31. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1): 1929–1958.
  32. When and how epochwise double descent happens. arXiv preprint arXiv:2108.12006.
  33. Relearning Forgotten Knowledge: on Forgetting, Overfit and Training-Free Ensembles of DNNs. arXiv preprint arXiv:2310.11094.
  34. Scan: Learning to classify images without labels. In European Conference on Computer Vision, 268–285. Springer.
  35. GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. CoRR, abs/1804.07461.
  36. Batchensemble: an alternative approach to efficient ensemble and lifelong learning. arXiv preprint arXiv:2002.06715.
  37. Hyperparameter ensembles for robustness and uncertainty quantification. Advances in Neural Information Processing Systems, 33: 6514–6527.
  38. Learning from massive noisy labeled data for image classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2691–2699.
  39. Learning from noisy labels with no change to the training process. In International Conference on Machine Learning, 12468–12478. PMLR.

Summary

We haven't generated a summary for this paper yet.