Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MatchXML: An Efficient Text-label Matching Framework for Extreme Multi-label Text Classification (2308.13139v2)

Published 25 Aug 2023 in cs.CL and cs.LG

Abstract: The eXtreme Multi-label text Classification(XMC) refers to training a classifier that assigns a text sample with relevant labels from an extremely large-scale label set (e.g., millions of labels). We propose MatchXML, an efficient text-label matching framework for XMC. We observe that the label embeddings generated from the sparse Term Frequency-Inverse Document Frequency(TF-IDF) features have several limitations. We thus propose label2vec to effectively train the semantic dense label embeddings by the Skip-gram model. The dense label embeddings are then used to build a Hierarchical Label Tree by clustering. In fine-tuning the pre-trained encoder Transformer, we formulate the multi-label text classification as a text-label matching problem in a bipartite graph. We then extract the dense text representations from the fine-tuned Transformer. Besides the fine-tuned dense text embeddings, we also extract the static dense sentence embeddings from a pre-trained Sentence Transformer. Finally, a linear ranker is trained by utilizing the sparse TF-IDF features, the fine-tuned dense text representations and static dense sentence features. Experimental results demonstrate that MatchXML achieves state-of-the-art accuracy on five out of six datasets. As for the speed, MatchXML outperforms the competing methods on all the six datasets. Our source code is publicly available at https://github.com/huiyegit/MatchXML.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (71)
  1. O. Dekel and O. Shamir, “Multiclass-multilabel classification with more classes than examples,” in Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 2010, pp. 137–144.
  2. Y. Prabhu, A. Kag, S. Harsola, R. Agrawal, and M. Varma, “Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising,” in WWW, 2018.
  3. P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” in Proceedings of the 10th ACM conference on recommender systems, 2016, pp. 191–198.
  4. H.-F. Yu, K. Zhong, J. Zhang, W.-C. Chang, and I. S. Dhillon, “Pecos: Prediction for enormous and correlated output spaces,” Journal of Machine Learning Research, vol. 23, no. 98, pp. 1–32, 2022.
  5. W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, and I. S. Dhillon, “Taming pretrained transformers for extreme multi-label text classification,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 3163–3171.
  6. T. Jiang, D. Wang, L. Sun, H. Yang, Z. Zhao, and F. Zhuang, “LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification,” in AAAI, 2021.
  7. J. Zhang, W.-c. Chang, H.-f. Yu, and I. Dhillon, “Fast multi-resolution transformer fine-tuning for extreme multi-label text classification,” Advances in Neural Information Processing Systems, vol. 34, 2021.
  8. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, “Distributed representations of words and phrases and their compositionality,” in Advances in neural information processing systems, 2013, pp. 3111–3119.
  9. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
  10. Y. Prabhu and M. Varma, “Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning,” in KDD, 2014.
  11. K. Bhatia, H. Jain, P. Kar, M. Varma, and P. Jain, “Sparse local embeddings for extreme multi-label classification,” in NIPS, 2015.
  12. H. Jain, Y. Prabhu, and M. Varma, “Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications,” in KDD, 2016.
  13. I. E. Yen, X. Huang, K. Zhong, P. Ravikumar, and I. S. Dhillon, “PD-Sparse: A primal and dual sparse approach to extreme multiclass and multilabel classification,” in International Conference on Machine Learning (ICML), 2016.
  14. I. E. Yen, X. Huang, W. Dai, P. Ravikumar, I. Dhillon, and E. Xing, “PPDsparse: A parallel primal-dual sparse method for extreme classification,” in KDD.   ACM, 2017.
  15. R. Babbar and B. Schölkopf, “DiSMEC: distributed sparse machines for extreme multi-label classification,” in WSDM, 2017.
  16. Y. Tagami, “AnnexML: Approximate nearest neighbor search for extreme multi-label classification,” in Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 455–464.
  17. W. Siblini, P. Kuntz, and F. Meyer, “CRAFTML, an efficient clustering-based random forest for extreme multi-label learning,” in Proceedings of the 35th International Conference on Machine Learning, 2018.
  18. M. Wydmuch, K. Jasinska, M. Kuznetsov, R. Busa-Fekete, and K. Dembczynski, “A no-regret generalization of hierarchical softmax to extreme multi-label classification,” in NIPS, 2018.
  19. H. Jain, V. Balasubramanian, B. Chunduri, and M. Varma, “SLICE: Scalable linear extreme classifiers trained on 100 million labels for related searches,” in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining.   ACM, 2019, pp. 528–536.
  20. S. Khandagale, H. Xiao, and R. Babbar, “BONSAI-diverse and shallow trees for extreme multi-label classification,” arXiv preprint arXiv:1904.08249, 2019.
  21. K. Dahiya, A. Agarwal, D. Saini, K. Gururaj, J. Jiao, A. Singh, S. Agarwal, P. Kar, and M. Varma, “Siamesexml: Siamese networks meet extreme classifiers with 100m labels,” in International Conference on Machine Learning.   PMLR, 2021, pp. 2330–2340.
  22. K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma, “DeepXML: A deep extreme multi-label learning framework applied to short text documents,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 31–39.
  23. J. Liu, W.-C. Chang, Y. Wu, and Y. Yang, “Deep learning for extreme multi-label text classification,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval.   ACM, 2017, pp. 115–124.
  24. R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu, “AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification,” in Advances in Neural Information Processing Systems, 2019, pp. 5812–5822.
  25. H. Ye, Z. Chen, D.-H. Wang, and B. Davison, “Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification,” in International Conference on Machine Learning.   PMLR, 2020, pp. 10 809–10 819.
  26. S. Kharbanda, A. Banerjee, E. Schultheis, and R. Babbar, “Cascadexml: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification,” in Advances in Neural Information Processing Systems, 2022.
  27. J. Ni, G. H. Abrego, N. Constant, J. Ma, K. Hall, D. Cer, and Y. Yang, “Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models,” in Findings of the Association for Computational Linguistics: ACL 2022, 2022, pp. 1864–1874.
  28. I. Evron, E. Moroshko, and K. Crammer, “Efficient loss-based decoding on graphs for extreme classification,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  29. A. Jalan and P. Kar, “Accelerating extreme classification via adaptive feature agglomeration,” in Proceedings of the 28th International Joint Conference on Artificial Intelligence, 2019, pp. 2600–2606.
  30. I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos, “Large-scale multi-label text classification on eu legislation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 6314–6322.
  31. T. Medini, Q. Huang, Y. Wang, V. Mohan, and A. Shrivastava, “Extreme classification in log memory using count-min sketch: a case study of amazon search with 50m products,” in Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, pp. 13 265–13 275.
  32. Y. Prabhu, A. Kag, S. Gopinath, K. Dahiya, S. Harsola, R. Agrawal, and M. Varma, “Extreme multi-label learning with label features for warm-start tagging, ranking & recommendation,” in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining.   ACM, 2018, pp. 441–449.
  33. A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma, “DECAF: Deep extreme classification with label features,” in Proceedings of the 14th ACM International Conference on Web Search and Data Mining, 2021, pp. 49–57.
  34. A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma, “ECLARE: Extreme classification with label graph correlations,” in Proceedings of The ACM International World Wide Web Conference, April 2021.
  35. D. Saini, A. Jain, K. Dave, J. Jiao, A. Singh, R. Zhang, and M. Varma, “GalaXC: Graph neural networks with labelwise attention for extreme classification,” in Proceedings of The Web Conference, April 2021.
  36. N. Gupta, S. Bohra, Y. Prabhu, S. Purohit, and M. Varma, “Generalized zero-shot extreme multi-label learning,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 527–535.
  37. A. Mittal, K. Dahiya, S. Malani, J. Ramaswamy, S. Kuruvilla, J. Ajmera, K.-h. Chang, S. Agarwal, P. Kar, and M. Varma, “Multi-modal extreme classification,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 12 393–12 402.
  38. K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, P. Dey, A. Singh, D. Hada et al., “Ngame: Negative mining-aware mini-batching for extreme classification,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 258–266.
  39. R. Babbar and B. Schölkopf, “Data scarcity, robustness and extreme multi-label classification,” Machine Learning, pp. 1–23, 2019.
  40. M. Wydmuch, K. Jasinska-Kobus, R. Babbar, and K. Dembczynski, “Propensity-scored probabilistic label trees,” in Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021, pp. 2252–2256.
  41. M. Qaraei, E. Schultheis, P. Gupta, and R. Babbar, “Convex surrogates for unbiased loss functions in extreme classification with missing labels,” in Proceedings of the Web Conference 2021, 2021, pp. 3711–3720.
  42. E. Schultheis and R. Babbar, “Speeding-up one-versus-all training for extreme classification via mean-separating initialization,” Machine Learning, pp. 1–24, 2022.
  43. E. Schultheis, M. Wydmuch, R. Babbar, and K. Dembczynski, “On missing labels, long-tails and propensities in extreme multi-label classification,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 1547–1557.
  44. J.-Y. Jiang, W.-C. Chang, J. Zhang, C.-J. Hsieh, and H.-F. Yu, “Relevance under the iceberg: Reasonable prediction for extreme multi-label classification,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 1870–1874.
  45. T. Z. Baharav, D. L. Jiang, K. Kolluri, S. Sanghavi, and I. S. Dhillon, “Enabling efficiency-precision trade-offs for label trees in extreme classification,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, 2021, pp. 3717–3726.
  46. X. Liu, W.-C. Chang, H.-F. Yu, C.-J. Hsieh, and I. Dhillon, “Label disentanglement in partition-based extreme multilabel classification,” Advances in Neural Information Processing Systems, vol. 34, pp. 15 359–15 369, 2021.
  47. D. Zong and S. Sun, “Bgnn-xml: Bilateral graph neural networks for extreme multi-label text classification,” IEEE Transactions on Knowledge and Data Engineering, 2022.
  48. J. D. M.-W. C. Kenton and L. K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of NAACL-HLT, 2019, pp. 4171–4186.
  49. Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, “RoBERTa: A robustly optimized BERT pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.
  50. Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. Salakhutdinov, and Q. V. Le, “XLNet: Generalized autoregressive pretraining for language understanding,” in NIPS, 2019.
  51. L. Wang, Y. Li, and S. Lazebnik, “Learning deep structure-preserving image-text embeddings,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5005–5013.
  52. K.-H. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 201–216.
  53. Y.-C. Chen, L. Li, L. Yu, A. E. Kholy, F. Ahmed, Z. Gan, Y. Cheng, and J. Liu, “Uniter: Universal image-text representation learning,” in ECCV, 2020.
  54. H. Tan and M. Bansal, “Lxmert: Learning cross-modality encoder representations from transformers,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 5100–5111.
  55. T. Xu, P. Zhang, Q. Huang, H. Zhang, Z. Gan, X. Huang, and X. He, “Attngan: Fine-grained text to image generation with attentional generative adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 1316–1324.
  56. M. Zhu, P. Pan, W. Chen, and Y. Yang, “Dm-gan: Dynamic memory generative adversarial networks for text-to-image synthesis,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 5802–5810.
  57. G. Yin, B. Liu, L. Sheng, N. Yu, X. Wang, and J. Shao, “Semantics disentangling for text-to-image generation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 2327–2336.
  58. H. Zhang, J. Y. Koh, J. Baldridge, H. Lee, and Y. Yang, “Cross-modal contrastive learning for text-to-image generation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 833–842.
  59. H. Ye, X. Yang, M. Takac, R. Sunderraman, and S. Ji, “Improving text-to-image synthesis using contrastive learning,” The 32nd British Machine Vision Conference (BMVC), 2021.
  60. K. He, H. Fan, Y. Wu, S. Xie, and R. Girshick, “Momentum contrast for unsupervised visual representation learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 9729–9738.
  61. X. Chen, H. Fan, R. Girshick, and K. He, “Improved baselines with momentum contrastive learning,” arXiv preprint arXiv:2003.04297, 2020.
  62. T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning.   PMLR, 2020, pp. 1597–1607.
  63. P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, and D. Krishnan, “Supervised contrastive learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 18 661–18 673, 2020.
  64. Q. Chen, R. Zhang, Y. Zheng, and Y. Mao, “Dual contrastive learning: Text classification via label-aware data augmentation,” arXiv preprint arXiv:2201.08702, 2022.
  65. B. Gunel, J. Du, A. Conneau, and V. Stoyanov, “Supervised contrastive learning for pre-trained language model fine-tuning,” arXiv preprint arXiv:2011.01403, 2020.
  66. H. Sedghamiz, S. Raval, E. Santus, T. Alhanai, and M. Ghassemi, “Supcl-seq: Supervised contrastive learning for downstream optimized sequence representations,” in Findings of the Association for Computational Linguistics: EMNLP 2021, 2021, pp. 3398–3403.
  67. Y. Xiong, W.-C. Chang, C.-J. Hsieh, H.-F. Yu, and I. Dhillon, “Extreme zero-shot learning for extreme text classification,” Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics, 2022.
  68. K. Bhatia, K. Dahiya, H. Jain, P. Kar, A. Mittal, Y. Prabhu, and M. Varma, “The extreme classification repository: Multi-label datasets and code,” 2016. [Online]. Available: http://manikvarma.org/downloads/XC/XMLRepository.html
  69. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2019.
  70. J. Howard and S. Ruder, “Universal language model fine-tuning for text classification,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2018, pp. 328–339.
  71. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub