MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification (2308.13139v2)
Abstract: The eXtreme Multi-label text Classification(XMC) refers to training a classifier that assigns a text sample with relevant labels from an extremely large-scale label set (e.g., millions of labels). We propose MatchXML, an efficient text-label matching framework for XMC. We observe that the label embeddings generated from the sparse Term Frequency-Inverse Document Frequency(TF-IDF) features have several limitations. We thus propose label2vec to effectively train the semantic dense label embeddings by the Skip-gram model. The dense label embeddings are then used to build a Hierarchical Label Tree by clustering. In fine-tuning the pre-trained encoder Transformer, we formulate the multi-label text classification as a text-label matching problem in a bipartite graph. We then extract the dense text representations from the fine-tuned Transformer. Besides the fine-tuned dense text embeddings, we also extract the static dense sentence embeddings from a pre-trained Sentence Transformer. Finally, a linear ranker is trained by utilizing the sparse TF-IDF features, the fine-tuned dense text representations and static dense sentence features. Experimental results demonstrate that MatchXML achieves state-of-the-art accuracy on five out of six datasets. As for the speed, MatchXML outperforms the competing methods on all the six datasets. Our source code is publicly available at https://github.com/huiyegit/MatchXML.
- O. Dekel and O. Shamir, “Multiclass-multilabel classification with more classes than examples,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp. 137–144.
- Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma, “Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising,” in WWW, 2018.
- P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, 2016, pp. 191–198.
- H.-F. Yu, K. Zhong, J. Zhang, W.-C. Chang, and I. S. Dhillon, “Pecos: Prediction for enormous and correlated output spaces,” Journal of Machine Learning Research, vol. 23, no. 98, pp. 1–32, 2022.
- W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon, “Taming pretrained transformers for extreme multi-label text classification,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3163–3171.
- T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang, “LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification,” in AAAI, 2021.
- J. Zhang, W.-c. Chang, H.-f. Yu, and I. Dhillon, “Fast multi-resolution transformer fine-tuning for extreme multi-label text classification,” Advances in Neural Information Processing Systems, vol. 34, 2021.
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
- T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
- Y. Prabhu and M. Varma, “Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning,” in KDD, 2014.
- K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, “Sparse local embeddings for extreme multi-label classification,” in NIPS, 2015.
- H. Jain, Y. Prabhu, and M. Varma, “Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications,” in KDD, 2016.
- I. E. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon, “PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification,” in International Conference on Machine Learning (ICML), 2016.
- I. E. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing, “PPDsparse: A parallel primal-dual sparse method for extreme classification,” in KDD. ACM, 2017.
- R. Babbar and B. Schölkopf, “DiSMEC: distributed sparse machines for extreme multi-label classification,” in WSDM, 2017.
- Y. Tagami, “AnnexML: Approximate nearest neighbor search for extreme multi-label classification,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 455–464.
- W. Siblini, P. Kuntz, and F. Meyer, “CRAFTML, an efficient clustering-based random forest for extreme multi-label learning,” in Proceedings of the 35th International Conference on Machine Learning, 2018.
- M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski, “A no-regret generalization of hierarchical softmax to extreme multi-label classification,” in NIPS, 2018.
- H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma, “SLICE: Scalable linear extreme classifiers trained on 100 million labels for related searches,” in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. ACM, 2019, pp. 528–536.
- S. Khandagale, H. Xiao, and R. Babbar, “BONSAI-diverse and shallow trees for extreme multi-label classification,” arXiv preprint arXiv:1904.08249, 2019.
- K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar, and M. Varma, “Siamesexml: Siamese networks meet extreme classifiers with 100m labels,” in International Conference on Machine Learning. PMLR, 2021, pp. 2330–2340.
- K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma, “DeepXML: A deep extreme multi-label learning framework applied to short text documents,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 31–39.
- J. Liu, W.-C. Chang, Y. Wu, and Y. Yang, “Deep learning for extreme multi-label text classification,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2017, pp. 115–124.
- R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu, “AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification,” in Advances in Neural Information Processing Systems, 2019, pp. 5812–5822.
- H. Ye, Z. Chen, D.-H. Wang, and B. Davison, “Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification,” in International Conference on Machine Learning. PMLR, 2020, pp. 10 809–10 819.
- S. Kharbanda, A. Banerjee, E. Schultheis, and R. Babbar, “Cascadexml: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification,” in Advances in Neural Information Processing Systems, 2022.
- J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, and Y. Yang, “Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 1864–1874.
- I. Evron, E. Moroshko, and K. Crammer, “Efficient loss-based decoding on graphs for extreme classification,” Advances in Neural Information Processing Systems, vol. 31, 2018.
- A. Jalan and P. Kar, “Accelerating extreme classification via adaptive feature agglomeration,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 2600–2606.
- I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, “Large-scale multi-label text classification on eu legislation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 6314–6322.
- T. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava, “Extreme classification in log memory using count-min sketch: a case study of amazon search with 50m products,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 13 265–13 275.
- Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma, “Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM, 2018, pp. 441–449.
- A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma, “DECAF: Deep extreme classification with label features,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 49–57.
- A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma, “ECLARE: Extreme classification with label graph correlations,” in Proceedings of The ACM International World Wide Web Conference, April 2021.
- D. Saini, A. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang, and M. Varma, “GalaXC: Graph neural networks with labelwise attention for extreme classification,” in Proceedings of The Web Conference, April 2021.
- N. Gupta, S. Bohra, Y. Prabhu, S. Purohit, and M. Varma, “Generalized zero-shot extreme multi-label learning,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 527–535.
- A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K.-h. Chang, S. Agarwal, P. Kar, and M. Varma, “Multi-modal extreme classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 393–12 402.
- K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, P. Dey, A. Singh, D. Hada et al., “Ngame: Negative mining-aware mini-batching for extreme classification,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 258–266.
- R. Babbar and B. Schölkopf, “Data scarcity, robustness and extreme multi-label classification,” Machine Learning, pp. 1–23, 2019.
- M. Wydmuch, K. Jasinska-Kobus, R. Babbar, and K. Dembczynski, “Propensity-scored probabilistic label trees,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2252–2256.
- M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar, “Convex surrogates for unbiased loss functions in extreme classification with missing labels,” in Proceedings of the Web Conference 2021, 2021, pp. 3711–3720.
- E. Schultheis and R. Babbar, “Speeding-up one-versus-all training for extreme classification via mean-separating initialization,” Machine Learning, pp. 1–24, 2022.
- E. Schultheis, M. Wydmuch, R. Babbar, and K. Dembczynski, “On missing labels, long-tails and propensities in extreme multi-label classification,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1547–1557.
- J.-Y. Jiang, W.-C. Chang, J. Zhang, C.-J. Hsieh, and H.-F. Yu, “Relevance under the iceberg: Reasonable prediction for extreme multi-label classification,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 1870–1874.
- T. Z. Baharav, D. L. Jiang, K. Kolluri, S. Sanghavi, and I. S. Dhillon, “Enabling efficiency-precision trade-offs for label trees in extreme classification,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 3717–3726.
- X. Liu, W.-C. Chang, H.-F. Yu, C.-J. Hsieh, and I. Dhillon, “Label disentanglement in partition-based extreme multilabel classification,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 359–15 369, 2021.
- D. Zong and S. Sun, “Bgnn-xml: Bilateral graph neural networks for extreme multi-label text classification,” IEEE Transactions on Knowledge and Data Engineering, 2022.
- J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
- Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
- Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” in NIPS, 2019.
- L. Wang, Y. Li, and S. Lazebnik, “Learning deep structure-preserving image-text embeddings,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5005–5013.
- K.-H. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 201–216.
- Y.-C. Chen, L. Li, L. Yu, A. E. Kholy, F. Ahmed, Z. Gan, Y. Cheng, and J. Liu, “Uniter: Universal image-text representation learning,” in ECCV, 2020.
- H. Tan and M. Bansal, “Lxmert: Learning cross-modality encoder representations from transformers,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 5100–5111.
- T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1316–1324.
- M. Zhu, P. Pan, W. Chen, and Y. Yang, “Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5802–5810.
- G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, “Semantics disentangling for text-to-image generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2327–2336.
- H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 833–842.
- H. Ye, X. Yang, M. Takac, R. Sunderraman, and S. Ji, “Improving text-to-image synthesis using contrastive learning,” The 32nd British Machine Vision Conference (BMVC), 2021.
- K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
- X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
- T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning. PMLR, 2020, pp. 1597–1607.
- P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
- Q. Chen, R. Zhang, Y. Zheng, and Y. Mao, “Dual contrastive learning: Text classification via label-aware data augmentation,” arXiv preprint arXiv:2201.08702, 2022.
- B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
- H. Sedghamiz, S. Raval, E. Santus, T. Alhanai, and M. Ghassemi, “Supcl-seq: Supervised contrastive learning for downstream optimized sequence representations,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 3398–3403.
- Y. Xiong, W.-C. Chang, C.-J. Hsieh, H.-F. Yu, and I. Dhillon, “Extreme zero-shot learning for extreme text classification,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, 2022.
- K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma, “The extreme classification repository: Multi-label datasets and code,” 2016. [Online]. Available: http://manikvarma.org/downloads/XC/XMLRepository.html
- J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
- J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 328–339.
- I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.