Regularized Neural Ensemblers (2410.04520v2)
Abstract: Ensemble methods are known for enhancing the accuracy and robustness of machine learning models by combining multiple base learners. However, standard approaches like greedy or random ensembling often fall short, as they assume a constant weight across samples for the ensemble members. This can limit expressiveness and hinder performance when aggregating the ensemble predictions. In this study, we explore employing regularized neural networks as ensemble methods, emphasizing the significance of dynamic ensembling to leverage diverse model predictions adaptively. Motivated by the risk of learning low-diversity ensembles, we propose regularizing the ensembling model by randomly dropping base model predictions during the training. We demonstrate this approach provides lower bounds for the diversity within the ensemble, reducing overfitting and improving generalization capabilities. Our experiments showcase that the regularized neural ensemblers yield competitive results compared to strong baselines across several modalities such as computer vision, natural language processing, and tabular data.
- Quick-tune: Quickly learning which pretrained model to finetune and how. In The Twelfth International Conference on Learning Representations, 2024.
- Openml benchmarking suites. arXiv:1708.03731v2 [stat.ML], 2019.
- Weight uncertainty in neural network. In Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pp. 1613–1622, Lille, France, 07–09 Jul 2015. PMLR.
- Leo Breiman. Bagging predictors. Machine learning, 24:123–140, 1996.
- Dynamic selection of classifiers—a comprehensive review. Pattern recognition, 47(11):3665–3680, 2014.
- Language models are few-shot learners. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- Ensemble selection from libraries of models. In Machine Learning, Proceedings of the Twenty-first International Conference (ICML 2004), Banff, Alberta, Canada, July 4-8, 2004. ACM, 2004.
- Dynamic selection approaches for multiple classifier systems. Neural computing and applications, 22:673–688, 2013.
- Meta-des: A dynamic ensemble selection framework using meta-learning. Pattern recognition, 48(5):1925–1935, 2015.
- Dynamic classifier selection: Recent advances and perspectives. Information Fusion, 41:195–216, 2018.
- Deslib: A dynamic ensemble selection library in python. Journal of Machine Learning Research, 21(8):1–5, 2020.
- BERT: pre-training of deep bidirectional transformers for language understanding. CoRR, abs/1810.04805, 2018.
- Thomas G. Dietterich. Ensemble Methods in Machine Learning. In Multiple Classifier Systems, pp. 1–15, Berlin, Heidelberg, 2000. Springer Berlin Heidelberg. ISBN 978-3-540-45014-6.
- A survey on ensemble learning. Frontiers Comput. Sci., 14(2):241–258, 2020. doi: 10.1007/S11704-019-8208-Z.
- Xuanyi Dong and Yi Yang. Nas-bench-201: Extending the scope of reproducible neural architecture search. In International Conference on Learning Representations (ICLR), 2020.
- Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505, 2020.
- Experiments with a new boosting algorithm. In icml, volume 96, pp. 148–156. Citeseer, 1996.
- Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In Proceedings of the 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pp. 1050–1059, New York, New York, USA, 20–22 Jun 2016. PMLR.
- Evaluating Scalable Bayesian Deep Learning Methods for Robust Computer Vision. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2020.
- Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12(10):993–1001, 1990.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations, 2022.
- Adaptive mixtures of local experts. Neural Computation, 3:79–87, 1991.
- Joint training of deep ensembles fails due to learner collusion. Advances in Neural Information Processing Systems, 36, 2024.
- Hierarchical mixtures of experts and the em algorithm. Neural Computation, 6:181–214, 1993.
- Kaggle. Write-ups from the 2024 automl grand prix. https://www.kaggle.com/automl-grand-prix, 2024. (accessed: 14.09.2024).
- From dynamic classifier selection to dynamic ensemble selection. Pattern recognition, 41(5):1718–1731, 2008.
- Yehuda Koren. The bellkor solution to the netflix grand prize. Netflix prize documentation, 81(2009):1–10, 2009.
- Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems, volume 7. MIT Press, 1994.
- Simple and scalable predictive uncertainty estimation using deep ensembles. Advances in neural information processing systems, 30, 2017.
- ALBERT: A lite BERT for self-supervised learning of language representations. CoRR, abs/1909.11942, 2019.
- Bayesian hyperparameter optimization for ensemble learning. In Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence, UAI 2016, June 25-29, 2016, New York City, NY, USA. AUAI Press, 2016.
- BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. CoRR, abs/1910.13461, 2019.
- Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics.
- Wei Chen Maggie, Phil Culliton. Tweet sentiment extraction, 2020. URL https://kaggle.com/competitions/tweet-sentiment-extraction.
- Evaluation of a tree-based pipeline optimization tool for automating data science. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, GECCO ’16, pp. 485–492, New York, NY, USA, 2016. ACM. ISBN 978-1-4503-4206-3. doi: 10.1145/2908812.2908918.
- Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift. In Advances in Neural Information Processing Systems 32, pp. 13991–14002. Curran Associates, Inc., 2019.
- Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.
- Cash via optimal diversity for ensemble learning. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, pp. 2411–2419, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704901.
- Assembled-openml: Creating efficient benchmarks for ensembles in automl with openml. arXiv preprint arXiv:2307.00285, 2023a.
- Cma-es for post hoc ensembling in automl: A great success and salvageable failure. In International Conference on Automated Machine Learning, pp. 1–1. PMLR, 2023b.
- Q(d)o-es: Population-based quality (diversity) optimisation for post hoc ensemble selection in automl. In International Conference on Automated Machine Learning, pp. 10–1. PMLR, 2023.
- Language models are unsupervised multitask learners, 2019.
- Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020.
- Ensemble learning: A survey. WIREs Data Mining Knowl. Discov., 8(4), 2018. doi: 10.1002/WIDM.1249.
- Tabrepo: A large scale repository of tabular model evaluations and its automl applications, 2023.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations, 2017.
- Divbo: diversity-aware cash for ensemble learning. Advances in Neural Information Processing Systems, 35:2958–2971, 2022.
- Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
- Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- An ensemble pruning primer. Applications of supervised and unsupervised ensemble methods, pp. 1–13, 2009.
- Setfit-mnli, 2021. URL https://huggingface.co/datasets/SetFit/mnli.
- Super learner. Statistical applications in genetics and molecular biology, 6(1), 2007.
- Bayesian learning via stochastic gradient langevin dynamics. In Proceedings of the 28th International Conference on International Conference on Machine Learning, ICML’11, pp. 681–688, Madison, WI, USA, 2011. Omnipress. ISBN 9781450306195.
- Batchensemble: an alternative approach to efficient ensemble and lifelong learning. In International Conference on Learning Representations, 2020.
- Hyperparameter ensembles for robustness and uncertainty quantification. In Advances in Neural Information Processing Systems, volume 33, pp. 6514–6527. Curran Associates, Inc., 2020.
- David H Wolpert. Stacked generalization. Neural networks, 5(2):241–259, 1992.
- A unified theory of diversity in ensemble learning. Journal of Machine Learning Research, 24(359):1–49, 2023.
- Deep sets. Advances in neural information processing systems, 30, 2017.
- Neural ensemble search for uncertainty estimation and dataset shift. Advances in Neural Information Processing Systems, 34:7898–7911, 2021.
- Character-level convolutional networks for text classification. In Advances in Neural Information Processing Systems, volume 28. Curran Associates, Inc., 2015.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.