Well-calibrated Confidence Measures for Multi-label Text Classification with a Large Number of Labels (2312.09304v1)
Abstract: We extend our previous work on Inductive Conformal Prediction (ICP) for multi-label text classification and present a novel approach for addressing the computational inefficiency of the Label Powerset (LP) ICP, arrising when dealing with a high number of unique labels. We present experimental results using the original and the proposed efficient LP-ICP on two English and one Czech language data-sets. Specifically, we apply the LP-ICP on three deep Artificial Neural Network (ANN) classifiers of two types: one based on contextualised (bert) and two on non-contextualised (word2vec) word-embeddings. In the LP-ICP setting we assign nonconformity scores to label-sets from which the corresponding p-values and prediction-sets are determined. Our approach deals with the increased computational burden of LP by eliminating from consideration a significant number of label-sets that will surely have p-values below the specified significance level. This reduces dramatically the computational complexity of the approach while fully respecting the standard CP guarantees. Our experimental results show that the contextualised-based classifier surpasses the non-contextualised-based ones and obtains state-of-the-art performance for all data-sets examined. The good performance of the underlying classifiers is carried on to their ICP counterparts without any significant accuracy loss, but with the added benefits of ICP, i.e. the confidence information encapsulated in the prediction sets. We experimentally demonstrate that the resulting prediction sets can be tight enough to be practically useful even though the set of all possible label-sets contains more than $1e+16$ combinations. Additionally, the empirical error rates of the obtained prediction-sets confirm that our outputs are well-calibrated.
- Gregory Goth. Deep or shallow, nlp is breaking out. Communications of the ACM, 59:13–16, 2016.
- Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 2007:1–13, 2007.
- Binary relevance multi-label conformal predictor. In Conformal and Probabilistic Prediction with Applications, pages 90–104, 2016.
- Learning multi-label scene classification. Pattern Recognition, 37(9):1757–1771, 2004.
- Ml-knn: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7):2038–2048, 2007.
- Multilabel neural networks with applications to functional genomics and text categorization. IEEE transactions on Knowledge and Data Engineering, 18(10):1338–1351, 2006.
- An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45(9):3084–3104, Sep 2012.
- Inductive venn prediction. Annals of Mathematics and Artificial Intelligence, 74:181–201, 2015.
- Harris Papadopoulos. Reliable probabilistic classification with neural networks. Neurocomputing, 107:59–68, 2013.
- A tutorial on conformal prediction. Journal of Machine Learning Research, 9:371–421, 2007.
- A deep neural network conformal predictor for multi-label text classification. In Alex Gammerman, Vladimir Vovk, Zhiyuan Luo, and Evgueni Smirnov, editors, Proceedings of the Eighth Symposium on Conformal and Probabilistic Prediction and Applications, volume 105 of Proceedings of Machine Learning Research, pages 228–245. PMLR, 09–11 Sep 2019.
- Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004.
- J. R. Firth. A synopsis of linguistic theory 1930-55., volume 1952-59. The Philological Society, 1957.
- Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 746–751, June 2013.
- Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing, volume 14, pages 1532–1543, 2014.
- Bag of tricks for efficient text classification, 2016.
- Attention is all you need, 2017.
- Sequence to sequence learning with neural networks, 2014.
- Neural machine translation by jointly learning to align and translate, 2014.
- Improving language understanding by generative pre-training. In preprint, 2018.
- Bert: Pre-training of deep bidirectional transformers for language understanding, 2019.
- Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12:2493–2537, 2011.
- Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint:1408.5882, 2014.
- Text understanding from scratch. arXiv preprint:1502.01710, 2015.
- Very deep convolutional networks for text classification. arXiv preprint:1606.01781, 2016.
- Densely connected cnn with multi-scale feature attention for text classification. In International Joint Conference on Artificial Intelligence, pages 4468–4474, 2018.
- How to fine-tune bert for text classification? In China National Conference on Chinese Computational Linguistics, pages 194–206. Springer, 2019.
- Hierarchical multi-label text classification: An attention-based recurrent network approach. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pages 1051–1060, 2019.
- Docbert: Bert for document classification. arXiv preprint:1904.08398, 2019.
- A tutorial on the cross-entropy method. Annals of Operations Research, 134(1):19–67, 2005.
- Adam: A method for stochastic optimization. In International Conference for Learning Representations, 2014.
- Deep neural networks for czech multi-label document classification. In Computational Linguistics and Intelligent Text Processing, pages 460–471, 2018.
- Word embeddings for multi-label document classification. In Recent Advances in Natural Language Processing, pages 431–437, 2017.
- Bert-based conformal predictor for sentiment analysis. In Alexander Gammerman, Vladimir Vovk, Zhiyuan Luo, Evgueni Smirnov, and Giovanni Cherubin, editors, Proceedings of the Ninth Symposium on Conformal and Probabilistic Prediction and Applications, volume 128 of Proceedings of Machine Learning Research, pages 269–284. PMLR, 09–11 Sep 2020.
- Universal language model fine-tuning for text classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018.
- Algorithmic Learning in a Random World. Springer-Verlag, Berlin, Heidelberg, 2005.
- Inductive confidence machines for regression. In Proceedings of the 13th European Conference on Machine Learning, volume 2430, pages 345–356, 2002.
- Qualified predictions for large data sets in the case of pattern recognition. In Proceedings of the International Conference on Machine Learning and Applications, pages 159–163, 2002.
- Reliable multi-label learning via conformal predictor and random forest for syndrome differentiation of chronic fatigue in traditional chinese medicine. PLOS ONE, 9(6), 06 2014.
- A comparison of three implementations of multi-label conformal prediction. In Statistical Learning and Data Sciences, pages 241–250, 2015.
- Harris Papadopoulos. A cross-conformal predictor for multi-label classification. In Artificial Intelligence Applications and Innovations, pages 241–250, 2014.
- Regression conformal prediction with nearest neighbours. Journal of Artificial Intelligence Research, 40, 2011.
- Tuning multilingual transformers for language-specific named entity recognition. In Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pages 89–93, August 2019.
- Sgm: Sequence generation model for multi-label classification. In Proceedings of the 27th International Conference on Computational Linguistics, pages 3915–3926, 2018.
- Criteria of efficiency for conformal prediction. arXiv preprint:1603.04416, 2016.
- Magnet: Multi-label text classification using attention-based graph neural network. In Proceedings of the 12th International Conference on Agents and Artificial Intelligence, volume 2, pages 494–505, 2020.
- Deep neural networks for Czech multi-label document classification. In International Conference on Intelligent Text Processing and Computational Linguistics, pages 460–471. Springer, 2016.