Listwise Generative Retrieval Models via a Sequential Learning Process (2403.12499v1)
Abstract: Recently, a novel generative retrieval (GR) paradigm has been proposed, where a single sequence-to-sequence model is learned to directly generate a list of relevant document identifiers (docids) given a query. Existing GR models commonly employ maximum likelihood estimation (MLE) for optimization: this involves maximizing the likelihood of a single relevant docid given an input query, with the assumption that the likelihood for each docid is independent of the other docids in the list. We refer to these models as the pointwise approach in this paper. While the pointwise approach has been shown to be effective in the context of GR, it is considered sub-optimal due to its disregard for the fundamental principle that ranking involves making predictions about lists. In this paper, we address this limitation by introducing an alternative listwise approach, which empowers the GR model to optimize the relevance at the docid list level. Specifically, we view the generation of a ranked docid list as a sequence learning process: at each step we learn a subset of parameters that maximizes the corresponding generation likelihood of the $i$-th docid given the (preceding) top $i-1$ docids. To formalize the sequence learning process, we design a positional conditional probability for GR. To alleviate the potential impact of beam search on the generation quality during inference, we perform relevance calibration on the generation likelihood of model-generated docids according to relevance grades. We conduct extensive experiments on representative binary and multi-graded relevance datasets. Our empirical results demonstrate that our method outperforms state-of-the-art GR baselines in terms of retrieval performance.
- Exploring the Limits of Large Scale Pre-training. In International Conference on Learning Representations.
- From Noisy Prediction to True Label: Noisy Prediction Calibration via Generative Model. In International Conference on Machine Learning. 1277–1297.
- Lamp: Extracting Text from Gradients with Language Model Priors. Advances in Neural Information Processing Systems (2022), 7641–7654.
- Jeffrey S. Beis and David G. Lowe. 1997. Shape Indexing Using Approximate Nearest-neighbour Search in High-dimensional Spaces. In Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition. 1000–1006.
- Learning Concept Importance Using a Weighted Dependence Model. In Proceedings of the third ACM international conference on Web search and data mining. 31–40.
- Parameterized Concept Weighting in Verbose Queries. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval. 605–614.
- Effective Query Formulation with Multiple Information Sources. In Proceedings of the fifth ACM international conference on Web search and data mining. 443–452.
- Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Advances in neural information processing systems 28 (2015), 1171–1179.
- Jon Louis Bentley. 1975. Multidimensional Binary Search Trees Used for Associative Searching. Commun. ACM 18, 9 (1975), 509–517.
- Autoregressive Search Engines: Generating Substrings as Document Identifiers. In Advances in Neural Information Processing Systems. 31668–31683.
- Learning to Rank: From Pairwise Approach to Listwise Approach. In Proceedings of the 24th international conference on Machine learning. 129–136.
- Pre-training Tasks for Embedding-based Large-scale Retrieval. In International Conference on Learning Representations.
- Intent-based Diversification of Web Search Results: Metrics and Algorithms. Information Retrieval 14 (2011), 572–592.
- A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning. In The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1448–1457.
- CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management. 191–200.
- Understanding Differential Search Index for Text Retrieval. In Findings of the Association for Computational Linguistics. 10701–10717.
- Overview of the TREC 2004 Terabyte Track. In TREC 2004. 74.
- Overview of the TREC 2009 Web Track. In TREC 2009. 20–29.
- Overview of the TREC 2019 Deep Learning Track. arXiv preprint arXiv:2003.07820 (2020).
- Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. arXiv preprint arXiv:1910.10687 (2019).
- Zhuyun Dai and Jamie Callan. 2020. Context-aware Document Term Weighting for Ad-hoc Search. In Proceedings of The Web Conference 2020. 1897–1907.
- Autoregressive Entity Retrieval. In International Conference on Learning Representations.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
- How Should Pre-Trained Language Models Be Fine-Tuned Towards Adversarial Robustness?. In Advances in Neural Information Processing Systems. 4356–4369.
- Beyond [CLS] through Ranking by Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 1722–1727.
- SPLADE v2: Sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021).
- SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.
- Learning Term Discrimination. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1993–1996.
- The Vocabulary Problem in Human-system Communication. Commun. ACM 30 (1987), 964–971. Issue 11.
- Optimized product quantization. IEEE transactions on pattern analysis and machine intelligence 36, 4 (2013), 744–755.
- Professor Forcing: A New Algorithm for Training Recurrent Networks. Advances in neural information processing systems 29 (2016), 4601–4609.
- A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proceedings of the 25th ACM international on conference on information and knowledge management. ACM, 55–64.
- Retrieval Augmented Language Model Pre-training. In International conference on machine learning. 3929–3938.
- Teacher Forcing Recovers Reward Functions for Text Generation. In Advances in Neural Information Processing Systems, Vol. 35. 12594–12607.
- Efficiently teaching an effective dense retriever with balanced topic aware sampling. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 113–122.
- Drew A. Hudson and Larry Zitnick. 2021. Generative Adversarial Transformers. In International Conference on Machine Learning. 4487–4499.
- Think Big, Teach Small: Do Language Models Distil Occam’s Razor?. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 1610–1623.
- Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated Gain-based Evaluation of IR Techniques. ACM Transactions on Information Systems 20 (2002), 422–446.
- Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence 33, 1 (2010), 117–128.
- SpanBERT: Improving Pre-training by Representing and Predicting Spans. Transactions of the Association for Computational Linguistics 8 (2020), 64–77.
- Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 6769–6781.
- Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. 39–48.
- Controlling Conditional Language Models without Catastrophic Forgetting. In International Conference on Machine Learning. 11499–11528.
- On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting. In Advances in Neural Information Processing Systems, Vol. 35. 16203–16220.
- Jolanta Koszelew and Joanna Karbowska-Chilinska. 2020. Beam Search Algorithm for Anti-Collision Trajectory Planning for Many-to-Many Encounter Situations with Autonomous Surface Vehicles. Sensors 20 (2020), 4115. Issue 15.
- Natural Questions: A Benchmark for Question Answering Research. Transactions oof the Association for Computational Linguistics 7 (2019), 452–466.
- Generative cooperative networks for natural language generation. In International Conference on Machine Learning. 11891–11905.
- Position-Aware ListMLE: A Sequential Learning Process for Ranking. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. 449–458.
- Co-training Improves Prompt-based Learning for Large Language Models. In International Conference on Machine Learning. 11985–12003.
- Victor Lavrenko and W. Bruce Croft. 2017. Relevance-based Language Models. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, Vol. 51. 260–267.
- Deep Learning. Nature 521, 7553 (2015), 436–444.
- Nonparametric Decoding for Generative Retrieval. The 61st Annual Meeting of the Association for Computational Linguistics (2023).
- Contextualized Sparse Representations for Real-Time Open-Domain Question Answering. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 912–919.
- Latent Retrieval for Weakly Supervised Open Domain Question Answering. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 6086–6096.
- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 7871–7880.
- Large Language Models Can Be Strong Differentially Private Learners. In International Conference on Learning Representations.
- Multiview Identifiers Enhanced Generative Retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6636–6648.
- Guan-Yu Lin and Pu-Jen Cheng. 2022. R-TeaFor: Regularized Teacher-Forcing for Abstractive Summarization. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 6303–6311.
- Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations. In Proceedings of the 44th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021). 2356–2362.
- Generative Causal Explanations for Graph Neural Networks. In International Conference on Machine Learning. 6666–6679.
- RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv preprint arXiv:1907.11692 (2019).
- Less is More: Pretrain a Strong Siamese Encoder for Dense Text Retrieval Using a Weak Decoder. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2780–2791.
- Sparse, Dense, and Attentional Representations for Text Retrieval. Transactions of the Association for Computational Linguistics 9 (2021), 329–345.
- R. Duncan Luce. 2012. Individual Choice Behavior: A Theoretical Analysis. Courier Corporation.
- Pre-Train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 848–858.
- PROP: Pre-Training with Representative Words Prediction for Ad-Hoc Retrieval. In Proceedings of the 14th ACM international conference on web search and data mining. 283–291.
- B-PROP: bootstrapped pre-training with representative words prediction for ad-hoc retrieval. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1513–1522.
- Pre-Training for Ad-Hoc Retrieval: Hyperlink is Also You Need. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, 1212–1221.
- Rethinking Search: Making Domain Experts Out of Dilettantes. In ACM Special Interest Group on Information Retrieval forum, Vol. 55. ACM New York, NY, USA, 1–27.
- MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems.
- DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding. In Proceedings of the 43rd international ACM SIGIR conference on research and development in information retrieval. ACM, 1829–1832.
- Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics. 708–718.
- Rodrigo Nogueira and Jimmy Lin. 2019. From doc2query to docTTTTTquery. An MS MARCO Passage Retrieval Task Publication. University of Waterloo.
- Training Language Models to Follow Instructions with Human Feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Robin L. Plackett. 1975. The Analysis of Permutations. Journal of the Royal Statistical Society Series C: Applied Statistics 24, 2 (1975), 193–202.
- How Does Generative Retrieval Scale to Millions of Passages?. In Gen-IR@SIGIR 2023: The First Workshop on Generative Information Retrieval.
- Exploring the Limits of Transfer Learning with a Unified Text-to-text Transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551.
- TOME: A Two-stage Approach for Model-based Retrieval. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6102–6114.
- Okapi at TREC-3. In Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg. NIST, 109–126.
- Using Graded-relevance Metrics for Evaluating Community QA Answer Selection. In Proceedings of the fourth ACM international conference on Web search and data mining. 187–196.
- Chirag Shah and Emily M Bender. 2022. Situating search. In ACM SIGIR Conference on Human Information Interaction and Retrieval. 221–232.
- Learning to Tokenize for Generative Retrieval. In Advances in Neural Information Processing Systems.
- Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies. In 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4904–4913.
- Transformer Memory as a Differentiable Search Index. In Advances in Neural Information Processing Systems, Vol. 35. 21831–21843.
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data Using t-SNE. Journal of Machine Learning Research 9, 11 (2008).
- Ellen M. Voorhees. 2004. Overview of the TREC 2004 Robust Retrieval Track. In TREC. 69–77.
- A Neural Corpus Indexer for Document Retrieval. In Advances in Neural Information Processing Systems, Vol. 35. 25600–25614.
- Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. In Advances in Neural Information Processing Systems, Vol. 34. 16158–16170.
- Finetuned Language Models are Zero-Shot Learners. In International Conference on Learning Representations.
- Ronald J. Williams and David Zipser. 1998. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation 1 (1998), 270–280.
- Listwise Approach to Learning to Rank: Theory and Algorithm. In Proceedings of the 25th international conference on Machine learning. 1192–1199.
- RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 538–548.
- Approximate nearest neighbor negative contrastive learning for dense text retrieval. arXiv preprint arXiv:2007.00808 (2020).
- Mixed negative sampling for learning two-tower neural networks in recommendations. In Companion proceedings of the web conference 2020. 441–447.
- Breaking the Beam Search Curse: A Study of (Re-) Scoring Methods and Stopping Criteria for Neural Machine Translation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3054–3059.
- Scalable and Effective Generative Information Retrieval. In The Web Conference.
- Curriculum learning for dense retrieval distillation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1979–1983.
- Optimizing Dense Retrieval Model Training with Hard Negatives. In The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1503–1512.
- RepBERT: Contextualized Text Embeddings for First-stage Retrieval. arXiv preprint arXiv:2006.15498 (2020).
- Pegasus: Pre-training with Extracted Gap-sentences for Abstractive Summarization. In International Conference on Machine Learning. 11328–11339.
- Understanding Failures in Out-of-distribution Detection with Deep Generative Models. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139). 12427–12436.
- Le Zhao and Jamie Callan. 2010. Term Necessity Prediction. In Proceedings of the 19th ACM Conference on Information and Knowledge Management. 259–268.
- Calibrating Sequence Likelihood Improves Conditional Language Generation. In The Eleventh International Conference on Learning Representations.
- Guoqing Zheng and Jamie Callan. 2015. Learning to Reweight Terms with Distributed Representations. In Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 575–584.
- Bridging the Gap between Indexing and Retrieval for Differentiable Search Index with Query Generation. arXiv preprint arXiv:2206.10128 (2022).
- Yubao Tang (6 papers)
- Ruqing Zhang (60 papers)
- Jiafeng Guo (161 papers)
- Maarten de Rijke (263 papers)
- Wei Chen (1290 papers)
- Xueqi Cheng (274 papers)