Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learning label-label correlations in Extreme Multi-label Classification via Label Features (2405.04545v1)

Published 3 May 2024 in cs.LG and cs.IR

Abstract: Extreme Multi-label Text Classification (XMC) involves learning a classifier that can assign an input with a subset of most relevant labels from millions of label choices. Recent works in this domain have increasingly focused on a symmetric problem setting where both input instances and label features are short-text in nature. Short-text XMC with label features has found numerous applications in areas such as query-to-ad-phrase matching in search ads, title-based product recommendation, prediction of related searches. In this paper, we propose Gandalf, a novel approach which makes use of a label co-occurrence graph to leverage label features as additional data points to supplement the training distribution. By exploiting the characteristics of the short-text XMC problem, it leverages the label features to construct valid training instances, and uses the label graph for generating the corresponding soft-label targets, hence effectively capturing the label-label correlations. Surprisingly, models trained on these new training instances, although being less than half of the original dataset, can outperform models trained on the original dataset, particularly on the PSP@k metric for tail labels. With this insight, we aim to train existing XMC algorithms on both, the original and new training instances, leading to an average 5% relative improvements for 6 state-of-the-art algorithms across 4 benchmark datasets consisting of up to 1.3M labels. Gandalf can be applied in a plug-and-play manner to various methods and thus forwards the state-of-the-art in the domain, without incurring any additional computational overheads.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Lada A Adamic and Bernardo A Huberman. 2002. Zipf’s law and the Internet. Glottometrics 3, 1 (2002), 143–150.
  2. Anonymous. 2024. Enhancing Tail Performance in Extreme Classifiers by Label Variance Reduction. In The Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=6ARlSgun7J
  3. R. Babbar and B. Schölkopf. 2017. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM.
  4. R. Babbar and B. Schölkopf. 2019. Data scarcity, robustness and extreme multi-label classification. Machine Learning 108 (2019), 1329–1351.
  5. Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD ’19). Association for Computing Machinery, New York, NY, USA, 257–266. https://doi.org/10.1145/3292500.3330925
  6. PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation. arXiv preprint arXiv:2305.12349 (2023).
  7. SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139). PMLR, 2330–2340. https://proceedings.mlr.press/v139/dahiya21a.html
  8. NGAME: Negative Mining-aware Mini-batching for Extreme Classification. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 258–266.
  9. DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (Virtual Event, Israel) (WSDM ’21). Association for Computing Machinery, New York, NY, USA, 31–39. https://doi.org/10.1145/3437963.3441810
  10. On label dependence and loss minimization in multi-label classification. Machine Learning 88 (2012), 5–45.
  11. Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS.
  12. Efficacy of Dual-Encoders for Extreme Multi-Label Classification. arXiv:2310.10636 [cs.LG]
  13. Open Graph Benchmark: Datasets for Machine Learning on Graphs. In Proceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada) (NIPS’20). Curran Associates Inc., Red Hook, NY, USA, Article 1855, 16 pages.
  14. A flexible class of dependence-aware multi-label loss functions. Machine Learning 111, 2 (2022), 713–737.
  15. Slice: Scalable Linear Extreme Classifiers Trained on 100 Million Labels for Related Searches. Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (2019).
  16. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD. 935–944.
  17. Renee: End-to-end training of extreme classification models. Proceedings of Machine Learning and Systems (2023).
  18. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, 1601–1611. https://doi.org/10.18653/v1/P17-1147
  19. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).
  20. Bonsai: diverse and shallow trees for extreme multi-label classification. Machine Learning 109, 11 (2020), 2099–2119.
  21. InceptionXML: A Lightweight Framework with Synchronized Negative Sampling for Short Text Extreme Classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (Taipei, Taiwan) (SIGIR ’23). Association for Computing Machinery, Taipei, Taiwan, 760–769. https://doi.org/10.1145/3539618.3591699
  22. CascadeXML: Rethinking Transformers for End-to-end Multi-resolution Training in Extreme Multi-label Classification. In Advances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 2074–2087. https://proceedings.neurips.cc/paper_files/paper/2022/file/0e0157ce5ea15831072be4744cbd5334-Paper-Conference.pdf
  23. Omar Khattab and Matei Zaharia. 2020. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48.
  24. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452–466. https://doi.org/10.1162/tacl_a_00276
  25. Ernie-search: Bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval. arXiv preprint arXiv:2205.09153 (2022).
  26. Multilabel reductions: what is my loss optimising?. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2019/file/da647c549dde572c2c5edc4f5bef039c-Paper.pdf
  27. DECAF: Deep Extreme Classification with Label Features. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining (Virtual Event, Israel) (WSDM ’21). Association for Computing Machinery, New York, NY, USA, 49–57. https://doi.org/10.1145/3437963.3441807
  28. ECLARE: Extreme Classification with Label Graph Correlations. In Proceedings of the Web Conference 2021 (Ljubljana, Slovenia) (WWW ’21). Association for Computing Machinery, New York, NY, USA, 3721–3732. https://doi.org/10.1145/3442381.3449815
  29. Ms marco: A human-generated machine reading comprehension dataset. (2016).
  30. Lshtc: A benchmark for large-scale text classification. arXiv preprint arXiv:1503.08581 (2015).
  31. Parabel: Partitioned Label Trees for Extreme Classification with Application to Dynamic Search Advertising. In Proceedings of the 2018 World Wide Web Conference (Lyon, France) (WWW ’18). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, 993–1002. https://doi.org/10.1145/3178876.3185998
  32. Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels. In Proceedings of the Web Conference 2021. 3711–3720.
  33. RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Online, 5835–5847. https://doi.org/10.18653/v1/2021.naacl-main.466
  34. RocketQAv2: A Joint Training Method for Dense Passage Retrieval and Passage Re-ranking. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 2825–2835. https://doi.org/10.18653/v1/2021.emnlp-main.224
  35. GalaXC: Graph neural networks with labelwise attention for extreme classification. In ACM International World Wide Web Conference. https://www.microsoft.com/en-us/research/publication/galaxc/
  36. Erik Schultheis and Rohit Babbar. 2022. Speeding-up one-versus-all training for extreme classification via mean-separating initialization. Machine Learning 111, 11 (2022), 3953–3976.
  37. On missing labels, long-tails and propensities in extreme multi-label classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 1547–1557.
  38. Generalized test utilities for long-tail performance in extreme multi-label classification. Advances in Neural Information Processing Systems 36 (2024).
  39. An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC bioinformatics 16, 1 (2015), 1–28.
  40. A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS.
  41. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. In International Conference on Learning Representations. https://openreview.net/forum?id=zeFrfgyZln
  42. Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification. In ICML.
  43. Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In NeurIPS.
  44. Adversarial retriever-ranker for dense text retrieval. arXiv preprint arXiv:2110.03611 (2021).
  45. Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification. In Advances in Neural Information Processing Systems. https://openreview.net/forum?id=gjBz22V93a
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Siddhant Kharbanda (6 papers)
  2. Devaansh Gupta (6 papers)
  3. Erik Schultheis (16 papers)
  4. Atmadeep Banerjee (8 papers)
  5. Cho-Jui Hsieh (211 papers)
  6. Rohit Babbar (20 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com