Adaptive Neural Ranking Framework: Toward Maximized Business Goal for Cascade Ranking Systems (2310.10462v2)
Abstract: Cascade ranking is widely used for large-scale top-k selection problems in online advertising and recommendation systems, and learning-to-rank is an important way to optimize the models in cascade ranking. Previous works on learning-to-rank usually focus on letting the model learn the complete order or top-k order, and adopt the corresponding rank metrics (e.g. OPA and NDCG@k) as optimization targets. However, these targets can not adapt to various cascade ranking scenarios with varying data complexities and model capabilities; and the existing metric-driven methods such as the Lambda framework can only optimize a rough upper bound of limited metrics, potentially resulting in sub-optimal and performance misalignment. To address these issues, we propose a novel perspective on optimizing cascade ranking systems by highlighting the adaptability of optimization targets to data complexities and model capabilities. Concretely, we employ multi-task learning to adaptively combine the optimization of relaxed and full targets, which refers to metrics Recall@m@k and OPA respectively. We also introduce permutation matrix to represent the rank metrics and employ differentiable sorting techniques to relax hard permutation matrix with controllable approximate error bound. This enables us to optimize both the relaxed and full targets directly and more appropriately. We named this method as Adaptive Neural Ranking Framework (abbreviated as ARF). Furthermore, we give a specific practice under ARF. We use the NeuralSort to obtain the relaxed permutation matrix and draw on the variant of the uncertainty weight method in multi-task learning to optimize the proposed losses jointly. Experiments on a total of 4 public and industrial benchmarks show the effectiveness and generalization of our method, and online experiment shows that our method has significant application value.
- Fast Differentiable Sorting and Ranking. In ICML. 950–959.
- Revisiting approximate metric optimization in the age of deep neural networks. In SIGIR. 1241–1244.
- Christopher JC Burges. 2010. From ranknet to lambdarank to lambdamart: An overview. Learning (2010), 81.
- Learning to rank using gradient descent. In ICML. 89–96.
- Adapting ranking SVM to document retrieval. In SIGIR. 186–193.
- xCoT: Cross-lingual Instruction Tuning for Cross-lingual Chain-of-Thought Reasoning. arXiv preprint arXiv:2401.07037 (2024).
- Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval. In SIGIR. 445–454.
- GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In ICML. 793–802.
- Differentiable Ranking and Sorting using Optimal Transport. In NeurIPS. 6858–6868.
- Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees. ACM Trans. Inf. Syst. (2016), 15:1–15:31.
- Joint Optimization of Cascade Ranking Models. In WSDM. 15–23.
- Deep Sparse Rectifier Neural Networks. In AISTATS. 315–323.
- Stochastic Optimization of Sorting Networks via Continuous Relaxations. In ICLR.
- Calibrated Conversion Rate Prediction via Knowledge Distillation under Delayed Feedback in Online Advertising. In CIKM. 3983–3987.
- Rank and rate: multi-task learning for recommender systems. In RecSys. 451–454.
- MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks. In WWW. 2205–2215.
- MBCT: Tree-Based Feature-Aware Binning for Individual Uncertainty Calibration. In WWW. 2236–2246.
- On Optimizing Top-K Metrics for Neural Ranking Models. In SIGIR. 2303–2307.
- DCAF: A Dynamic Computation Allocation Framework for Online Serving System. CoRR abs/2006.09684 (2020).
- Thorsten Joachims. 2002. Optimizing search engines using clickthrough data. In SIGKDD. 133–142.
- Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft. CoRR abs/2106.14876 (2021).
- Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics. In CVPR. 7482–7491.
- Self-Normalizing Neural Networks. In NeurIPS. 971–980.
- McRank: Learning to Rank Using Multiple Classification and Gradient Boosting. In NeurIPS. 897–904.
- FAA: Fine-grained Attention Alignment for Cascade Document Ranking. In ACL. 1688–1700.
- mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view Contrastive Learning. arXiv preprint arXiv:2308.09073 (2023).
- TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank. In KDD. ACM, 2970–2978.
- Differentiable Sorting Networks for Scalable Sorting and Ranking Supervision. In ICML. 8546–8555.
- Monotonic Differentiable Sorting Networks. In ICLR.
- RankFlow: Joint Optimization of Multi-Stage Cascade Ranking Systems as Flows. In SIGIR. 814–824.
- Tao Qin and Tie-Yan Liu. 2013. Introducing LETOR 4.0 Datasets. CoRR abs/1306.2597 (2013). arXiv:1306.2597 http://arxiv.org/abs/1306.2597
- A general approximation framework for direct optimization of information retrieval measures. Inf. Retr. 13, 4 (2010), 375–397.
- Are Neural Rankers still Outperformed by Gradient Boosted Decision Trees?. In ICLR.
- PiRank: Scalable Learning To Rank via Differentiable Sorting. In NeurIPS. 21644–21654.
- Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations. In RecSys. 269–278.
- SoftRank: optimizing non-smooth rank metrics. In WSDM. 77–86.
- A cascade ranking model for efficient ranked retrieval. In SIGIR. 105–114.
- Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking. CoRR abs/2307.11096 (2023).
- The LambdaLoss Framework for Ranking Metric Optimization. In CIKM. 1313–1322.
- A Theoretical Analysis of NDCG Type Ranking Measures. In COLT. 25–54.
- Formality Style Transfer with Shared Latent Space. In COLING. 2236–2249.
- Smoothing DCG for learning to rank: a novel approach using smoothed hinge functions. In CIKM. 1923–1926.
- Jun Xu and Hang Li. 2007. AdaRank: a boosting algorithm for information retrieval. In SIGIR. 391–398.
- GanLM: Encoder-Decoder Pre-training with an Auxiliary Discriminator. In ACL. 9394–9412.
- Multilingual Machine Translation Systems from Microsoft for WMT21 Shared Task. In WMT. 446–455.
- Alternating language modeling for cross-lingual pre-training. In AAAI. 9386–9393.
- High-resource Language-specific Training for Multilingual Neural Machine Translation. In IJCAI. 4461–4467.
- GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation. TASLP 31 (2023), 1489–1498.
- Computation Resource Allocation Solution in Recommender Systems. CoRR abs/2103.02259 (2021).
- Gradient Surgery for Multi-Task Learning. In NeurIPS 2020.
- Blind Image Quality Assessment via Vision-Language Correspondence: A Multitask Learning Perspective. In CVPR. 14071–14081.
- A General Boosting Method and its Application to Learning Ranking Functions for Web Search. In NeurIPS. 1697–1704.