UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification (2405.03714v1)
Abstract: Extreme Multi-label Classification (XMC) involves predicting a subset of relevant labels from an extremely large label space, given an input query and labels with textual features. Models developed for this problem have conventionally used modular approach with (i) a Dual Encoder (DE) to embed the queries and label texts, (ii) a One-vs-All classifier to rerank the shortlisted labels mined through meta-classifier training. While such methods have shown empirical success, we observe two key uncharted aspects, (i) DE training typically uses only a single positive relation even for datasets which offer more, (ii) existing approaches fixate on using only OvA reduction of the multi-label problem. This work aims to explore these aspects by proposing UniDEC, a novel end-to-end trainable framework which trains the dual encoder and classifier in together in a unified fashion using a multi-class loss. For the choice of multi-class loss, the work proposes a novel pick-some-label (PSL) reduction of the multi-label problem with leverages multiple (in come cases, all) positives. The proposed framework achieves state-of-the-art results on a single GPU, while achieving on par results with respect to multi-GPU SOTA methods on various XML benchmark datasets, all while using 4-16x lesser compute and being practically scalable even beyond million label scale datasets.
- Zipf’s law and the internet. Glottometrics, 3(1):143–150, 2002.
- R. Babbar and B. Schölkopf. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM, 2017.
- R. Babbar and B. Schölkopf. Data scarcity, robustness and extreme multi-label classification. Machine Learning, 108:1329–1351, 2019.
- The extreme classification repository: Multi-label datasets and code, 2016.
- Unsupervised learning of visual features by contrasting cluster assignments. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9912–9924. Curran Associates, Inc., 2020.
- Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD, 2020.
- Siamesexml: Siamese networks meet extreme classifiers with 100m labels. In International Conference on Machine Learning, pages 2330–2340. PMLR, 2021.
- Ngame: Negative mining-aware mini-batching for extreme classification. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 258–266, 2023.
- Deepxml: A deep extreme multi-label learning framework applied to short text documents. In Conference on Web Search and Data Mining (WSDM’21), 2021.
- Deep encoders with auxiliary parameters for extreme classification. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 358–367, 2023.
- Supervised contrastive learning for pre-trained language model fine-tuning, 2021.
- Efficacy of dual-encoders for extreme multi-label classification, 2023.
- Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In KDD, pages 935–944, 2016.
- Renee: End-to-end training of extreme classification models. Proceedings of Machine Learning and Systems, 2023.
- Extreme F-measure Maximization using Sparse Probability Estimates. In ICML, June 2016.
- Lightxml: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 7987–7994, 2021.
- Hard negative mixing for contrastive learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Red Hook, NY, USA, 2020. Curran Associates Inc.
- Dense passage retrieval for open-domain question answering. In B. Webber, T. Cohn, Y. He, and Y. Liu, editors, Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online, Nov. 2020. Association for Computational Linguistics.
- Inceptionxml: A lightweight framework with synchronized negative sampling for short text extreme classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 760–769, 2023.
- Cascadexml: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification. Advances in Neural Information Processing Systems, 35:2074–2087, 2022.
- Gandalf : Data augmentation is all you need for extreme classification, 2023.
- Supervised contrastive learning. Advances in neural information processing systems, 33:18661–18673, 2020.
- Multilabel reductions: what is my loss optimising? Advances in Neural Information Processing Systems, 32, 2019.
- Decaf: Deep extreme classification with label features. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 49–57, 2021.
- Eclare: Extreme classification with label graph correlations. In Proceedings of the Web Conference 2021, pages 3721–3732, 2021.
- Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5835–5847, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 2021.
- On missing labels, long-tails and propensities in extreme multi-label classification. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1547–1557, 2022.
- A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS, 2018.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808, 2020.
- Unified contrastive learning in image-text-label space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19163–19173, 2022.
- Attentionxml: Label tree-based attention-aware deep model for high-performance extreme multi-label text classification. In NeurIPS, 2019.
- Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. Advances in Neural Information Processing Systems, 34:7267–7280, 2021.
- Siddhant Kharbanda (6 papers)
- Devaansh Gupta (6 papers)
- Gururaj K (5 papers)
- Pankaj Malhotra (22 papers)
- Cho-Jui Hsieh (211 papers)
- Rohit Babbar (20 papers)