Graph Regularized Encoder Training for Extreme Classification (2402.18434v2)
Abstract: Deep extreme classification (XC) aims to train an encoder architecture and an accompanying classifier architecture to tag a data point with the most relevant subset of labels from a very large universe of labels. XC applications in ranking, recommendation and tagging routinely encounter tail labels for which the amount of training data is exceedingly small. Graph convolutional networks (GCN) present a convenient but computationally expensive way to leverage task metadata and enhance model accuracies in these settings. This paper formally establishes that in several use cases, the steep computational cost of GCNs is entirely avoidable by replacing GCNs with non-GCN architectures. The paper notices that in these settings, it is much more effective to use graph data to regularize encoder training than to implement a GCN. Based on these insights, an alternative paradigm RAMEN is presented to utilize graph metadata in XC settings that offers significant performance boosts with zero increase in inference computational costs. RAMEN scales to datasets with up to 1M labels and offers prediction accuracy up to 15% higher on benchmark datasets than state of the art methods, including those that use graph metadata to train GCNs. RAMEN also offers 10% higher accuracy over the best baseline on a proprietary recommendation dataset sourced from click logs of a popular search engine. Code for RAMEN will be released publicly.
- Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. In NeurIPS, 2019.
- DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents. In WSDM, 2021a.
- Multimodal extreme classification. In CVPR, June 2022.
- Cascadexml: Rethinking transformers for end-to-end multi-resolution training in extreme multi-label classification. In NeurIPS, 2022.
- R. Babbar and B. Schölkopf. DiSMEC: Distributed Sparse Machines for Extreme Multi-label Classification. In WSDM, 2017.
- AttentionXML: Extreme Multi-Label Text Classification with Multi-Label Attention Based Recurrent Neural Networks. In NeurIPS, 2019.
- Taming Pretrained Transformers for Extreme Multi-label Text Classification. In KDD, 2020.
- Parabel: Partitioned label trees for extreme classification with application to dynamic search advertising. In WWW, 2018a.
- Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking and Other Missing Label Applications. In KDD, August 2016.
- Slice: Scalable Linear Extreme Classifiers trained on 100 Million Labels for Related Searches. In WSDM, 2019.
- B. Dean. We analyzed 306m keywords; here’s what we learned about Google searches. Online article, 2020. URL https://backlinko.com/google-keyword-study.
- Distributed Representations of Words and Phrases and Their Compositionality. In NIPS, 2013.
- Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces. In NeurIPS, 2019.
- Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces. In ICML, 2021.
- Stochastic Negative Mining for Learning with Large Output Spaces. CoRR, 2018.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. In ICLR, 2021.
- DECAF: Deep Extreme Classification with Label Features. In WSDM, 2021a.
- SiameseXML: Siamese Networks meet Extreme Classifiers with 100M Labels. In ICML, 2021b.
- Ngame: Negative mining-aware mini-batching for extreme classification. In WSDM, March 2023a.
- ECLARE: Extreme Classification with Label Graph Correlations. In WWW, 2021b.
- GalaXC: Graph Neural Networks with Labelwise Attention for Extreme Classification. In WWW, 2021.
- Graphformers: Gnn-nested transformers for representation learning on textual graph. NeurIPS, 34:28798–28810, 2021.
- PINA: Leveraging side information in eXtreme multi-label classification via predicted instance neighborhood aggregation. In ICML, 2023.
- A no-regret generalization of hierarchical softmax to extreme multi-label classification. In NIPS, 2018.
- Deep Extreme Multi-label Learning. ICMR, 2018.
- Deep Learning for Extreme Multi-label Text Classification. In SIGIR, 2017.
- LightXML: Transformer with Dynamic Negative Sampling for High-Performance Extreme Multi-label Text Classification. In AAAI, 2021.
- Extreme Multi-Label Legal Text Classification: A case study in EU Legislation. In ACL, 2019.
- Pretrained Generalized Autoregressive Model with Adaptive Probabilistic Label Clusters for Extreme Multi-label Text Classification. In ICML, 2020.
- Fast multi-resolution transformer fine-tuning for extreme multi-label text classification. In NeurIPS, 2021.
- P. Mineiro and N. Karampatziakis. Fast Label Embeddings via Randomized Linear Algebra. In ECML/PKDD, 2015.
- Extreme F-measure Maximization using Sparse Probability Estimates. In ICML, 2016.
- Bonsai: diverse and shallow trees for extreme multi-label classification. ML, 2020.
- Y. Tagami. AnnexML: Approximate Nearest Neighbor Search for Extreme Multi-label Classification. In KDD, 2017.
- PPDSparse: A Parallel Primal-Dual Sparse Method for Extreme Classification. In KDD, 2017.
- Learning for Tail Label Data: A Label-Specific Feature Approach. In IJCAI, 2019.
- CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning. In ICML, 2018.
- A Submodular Feature-Aware Framework for Label Subset Selection in Extreme Classification Problems. In NAACL, 2019.
- Distributional Semantics Meets Multi-Label Learning. In AAAI, 2019.
- Efficacy of dual-encoders for extreme multi-label classification. In ICLR, 2023.
- Ngame: Negative mining-aware mini-batching for extreme classification. In WSDM, March 2023b.
- VSE++: Improving Visual-Semantic Embeddings with Hard Negatives. In BMVC, 2018.
- A simple framework for contrastive learning of visual representations. In ICML, 2020.
- Momentum contrast for unsupervised visual representation learning. In CVPR, 2020a.
- Dense passage retrieval for open-domain question answering. In EMNLP, 2020.
- Latent retrieval for weakly supervised open domain question answering. In ACL, 2019.
- Sparse, Dense, and Attentional Representations for Text Retrieval. In TACL, 2020.
- Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling. In SIGIR, 2021.
- Rocketqa: An optimized training approach to dense passage retrieval for open-domain question answering, 2021.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019a.
- DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv, 2019.
- TwinBERT: Distilling Knowledge to Twin-Structured Compressed BERT Models for Large-Scale Retrieval. In CIKM, 2020.
- A Modular Deep Learning Approach for Extreme Multi-label Text Classification. CoRR, 2019.
- Multi-Task Deep Neural Networks for Natural Language Understanding. In ACL, 2019b.
- Inductive Representation Learning on Large Graphs, 2018.
- FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling. In ICLR, 2018.
- Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks, 2019.
- Adaptive Sampling Towards Fast Graph Representation Learning, 2018.
- Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks. In KDD, 2019.
- GraphSAINT: Graph Sampling Based Inductive Learning Method. In ICLR, 2020.
- Textgnn: Improving text encoder via graph neural network in sponsored search. In theWebConf, pages 2848–2857, 2021.
- Lightgcn: Simplifying and powering graph convolution network for recommendation. In SIGIR, pages 639–648, 2020b.
- Knowledge graph contrastive learning for recommendation. In SIGIR Conference, page 1434–1443, 2022. URL https://github.com/yuh-yang/KGCL-SIGIR22.
- Online convex optimization in the bandit setting: Gradient descent without a gradient. In SIAM, 2005.
- The Extreme Classification Repository: Multi-label Datasets & Code, 2016. URL http://manikvarma.org/downloads/XC/XMLRepository.html.
- Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In EMNLP-IJCNLP, 2019.
- Automatic differentiation in PyTorch. In NIPS-W, 2017.
- R. Babbar and B. Schölkopf. Data scarcity, robustness and extreme multi-label classification. ML, 2019.
- Extreme multi-label learning with label features for warm-start tagging, ranking and recommendation. In WSDM, 2018b.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.