Diversity-Aware Ensembling of Language Models Based on Topological Data Analysis (2402.14184v1)
Abstract: Ensembles are important tools for improving the performance of machine learning models. In cases related to natural language processing, ensembles boost the performance of a method due to multiple large models available in open source. However, existing approaches mostly rely on simple averaging of predictions by ensembles with equal weights for each model, ignoring differences in the quality and conformity of models. We propose to estimate weights for ensembles of NLP models using not only knowledge of their individual performance but also their similarity to each other. By adopting distance measures based on Topological Data Analysis (TDA), we improve our ensemble. The quality improves for both text classification accuracy and relevant uncertainty estimation.
- A. N. Angelopoulos and S. Bates. A gentle introduction to conformal prediction and distribution-free uncertainty quantification, 2022.
- Ensemble approach for natural language question answering problem. In 2019 Seventh International Symposium on Computing and Networking Workshops (CANDARW), pages 180–183, 2019. 10.1109/CANDARW.2019.00039.
- Representation topology divergence: A method for comparing neural network representations. In International Conference on Machine Learning, pages 1607–1626. PMLR, 2022.
- S. P. Boyd and L. Vandenberghe. Convex optimization. Cambridge university press, 2004.
- L. Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
- Acceptability judgements via examining the topology of attention maps. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 88–107, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. 10.18653/v1/2022.findings-emnlp.7.
- A comparative study of traditional, ensemble and neural network-based natural language processing algorithms. Journal of Risk and Financial Management, 16(7), 2023. ISSN 1911-8074. 10.3390/jrfm16070327.
- Deep ensembles on a fixed memory budget: One wide network or several thinner ones? arXiv preprint arXiv:2005.07292, 2020.
- Revisiting the evaluation of uncertainty estimation and its application to explore model complexity-uncertainty trade-off. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 22–31, 2020. 10.1109/CVPRW50498.2020.00010.
- Deep neural network ensembles using class-vs-class weighting. IEEE Access, 11:77703–77715, 2023. 10.1109/ACCESS.2023.3298057.
- P. I. Frazier. A tutorial on bayesian optimization, 2018. URL https://api.semanticscholar.org/CorpusID:49656213.
- I. Galil and R. El-Yaniv. Disrupting deep uncertainty estimation without harming accuracy. In M. Ranzato, A. Beygelzimer, and Y. N. D. et al, editors, Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 21285–21296, 2021.
- Ensemble deep learning: A review. Engineering Applications of Artificial Intelligence, 115:105151, 10 2022. 10.1016/j.engappai.2022.105151.
- Loss surfaces, mode connectivity, and fast ensembling of dnns. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 8803–8812, Red Hook, NY, USA, 2018. Curran Associates Inc.
- Bias-reduced uncertainty estimation for deep neural classifiers. In International Conference on Learning Representations, 2018. URL https://api.semanticscholar.org/CorpusID:52901777.
- D. Hendrycks and K. Gimpel. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415, 2016.
- Averaging weights leads to wider optima and better generalization. arXiv preprint arXiv:1803.05407, 2018.
- Similarity of neural network representations revisited. In International conference on machine learning, pages 3519–3529. PMLR, 2019.
- N. Kozlovskaia and A. Zaytsev. Deep ensembles for imbalanced classification. In 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pages 908–913. IEEE, 2017.
- Artificial text detection via examining the topology of attention maps. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021. 10.18653/v1/2021.emnlp-main.50.
- Topological feature extraction and visualization of whole slide images using graph neural networks. bioRxiv, 2020. 10.1101/2020.08.01.231639.
- Ensembles of natural language processing systems for portable phenotyping solutions. Journal of Biomedical Informatics, 100:103318, 2019. ISSN 1532-0464. https://doi.org/10.1016/j.jbi.2019.103318.
- On power laws in deep ensembles. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
- Learning word vectors for sentiment analysis. In D. Lin, Y. Matsumoto, and R. Mihalcea, editors, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. URL https://aclanthology.org/P11-1015.
- A. Mohammed and R. Kora. A comprehensive review on ensemble deep learning: Opportunities and challenges. Journal of King Saud University - Computer and Information Sciences, 35(2):757–774, 2023. ISSN 1319-1578. https://doi.org/10.1016/j.jksuci.2023.01.014.
- J. M. Parrondo and C. BROECK. Error vs. rejection curve for the perceptron. EPL (Europhysics Letters), 22:319, 07 2007. 10.1209/0295-5075/22/5/001.
- J. Risch and R. Krestel. Bagging BERT models for robust aggression identification. In R. Kumar, A. K. Ojha, and et al., editors, Proceedings of the Second Workshop on Trolling, Aggression and Cyberbullying, pages 55–61, Marseille, France, May 2020. European Language Resources Association (ELRA). ISBN 979-10-95546-56-6. URL https://aclanthology.org/2020.trac-1.9.
- Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. ArXiv, abs/1910.01108, 2019. URL https://api.semanticscholar.org/CorpusID:203626972.
- H. Schwenk and Y. Bengio. Boosting neural networks. Neural computation, 12(8):1869–1887, 2000.
- Optimizing ensemble weights and hyperparameters of machine learning models for regression problems. Machine Learning with Applications, 7:100251, 01 2022. 10.1016/j.mlwa.2022.100251.
- How certain is your transformer? In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 1833–1840, 2021.
- Exploring predictive uncertainty and calibration in NLP: A study on the impact of method & data scarcity. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Findings of the Association for Computational Linguistics: EMNLP 2022, pages 2707–2735, Abu Dhabi, United Arab Emirates, Dec. 2022. Association for Computational Linguistics. 10.18653/v1/2022.findings-emnlp.198.
- L. Vandenberghe. The cvxopt linear and quadratic cone program solvers, 2010. URL https://www.seas.ucla.edu/~vandenbe/publications/coneprog.pdf.
- Uncertainty estimation of transformer predictions for misclassification detection. In S. Muresan, P. Nakov, and A. Villavicencio, editors, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8237–8252, Dublin, Ireland, May 2022. Association for Computational Linguistics. 10.18653/v1/2022.acl-long.566.
- Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression. Transactions of the Association for Computational Linguistics, 10:680–696, 06 2022. ISSN 2307-387X. 10.1162/tacl_a_00483.
- Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7:625–641, 2019. 10.1162/tacl_a_00290.
- Agree to disagree: When deep learning models with identical architectures produce distinct explanations. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 875–884, 2022.
- Y. Xiao and W. Y. Wang. Quantifying uncertainties in natural language processing tasks. In Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, AAAI’19/IAAI’19/EAAI’19. AAAI Press, 2019. ISBN 978-1-57735-809-1. 10.1609/aaai.v33i01.33017322.
- Uncertainty quantification with pre-trained language models: A large-scale empirical analysis. 10 2022. 10.48550/arXiv.2210.04714.