Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders (2404.06912v3)

Published 10 Apr 2024 in cs.IR

Abstract: Existing cross-encoder re-rankers can be categorized as pointwise, pairwise, or listwise models. Pair- and listwise models allow passage interactions, which usually makes them more effective than pointwise models but also less efficient and less robust to input order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder. In Cranfield-style experiments on TREC Deep Learning and TIREx, the Set-Encoder is as effective as state-of-the-art listwise models while improving efficiency and robustness to input permutations. Interestingly, a pointwise model is similarly effective, but when additionally requiring the models to consider novelty, the Set-Encoder is more effective than its pointwise counterpart and retains its advantageous properties compared to other listwise models. Our code and models are publicly available at https://github.com/webis-de/set-encoder.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (84)
  1. Cross-Domain Modeling of Sentence-Level Evidence for Document Retrieval. In Proceedings of EMNLP-IJCNLP 2019. Association for Computational Linguistics, Hong Kong, China, 3490–3496. https://doi.org/10.18653/v1/D19-1352
  2. Shallow Pooling for Sparse Labels. Information Retrieval Journal 25 (2022), 365–385. https://doi.org/10.1007/s10791-022-09411-0
  3. Overview of Touché 2022: Argument Retrieval. In Proceedings of CLEF 2022 (Lecture Notes in Computer Science, Vol. 13390). Springer International Publishing, Bologna, Italy, 311–336. https://doi.org/10.1007/978-3-031-13643-6_21
  4. Overview of Touché 2021: Argument Retrieval. In Proceedings of CLEF 2021 (Lecture Notes in Computer Science, Vol. 12880). Springer International Publishing, Virtual Event, 450–467. https://doi.org/10.1007/978-3-030-85251-1_28
  5. A Full-Text Learning to Rank Dataset for Medical Information Retrieval. In Proceedings of ECIR 2016 (Lecture Notes in Computer Science, Vol. 9626). Springer International Publishing, Padua, Italy, 716–722. https://doi.org/10.1007/978-3-319-30671-1_58
  6. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems 30, 1 (April 1998), 107–117. https://doi.org/10.1016/S0169-7552(98)00110-X
  7. Christopher J C Burges. 2010. From RankNet to LambdaRank to LambdaMART: An Overview. Technical Report MSR-TR-2010-82. Microsoft Research, Redmond, WA. 19 pages. https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/MSR-TR-2010-82.pdf
  8. The TREC 2006 Terabyte Track. In Proceedings of TREC 2006 (NIST Special Publication, Vol. 500–272). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 14. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf
  9. RankFormer: Listwise Learning-to-Rank Using Listwide Labels.. In Proceedings of KDD 2023. Association for Computational Linguistics, Long Beach, CA, USA, 3762–3773. https://doi.org/10.1145/3580305.3599892
  10. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators.. In Proceedings of ICLR 2020. OpenReview.net, Addis Ababa, Ethiopia, 14.
  11. Overview of the TREC 2004 Terabyte Track.. In Proceedings of TREC 2004 (NIST Special Publication, Vol. 500–261). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 9. http://trec.nist.gov/pubs/trec13/papers/TERA.OVERVIEW.pdf
  12. Overview of the TREC 2009 Web Track.. In Proceedings of TREC 2009 (NIST Special Publication, Vol. 500–278). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 9. http://trec.nist.gov/pubs/trec18/papers/WEB09.OVERVIEW.pdf
  13. Overview of the TREC 2010 Web Track.. In Proceedings of TREC 2010 (NIST Special Publication, Vol. 500–294). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 9. https://trec.nist.gov/pubs/trec19/papers/WEB.OVERVIEW.pdf
  14. Overview of the TREC 2011 Web Track.. In Proceedings of TREC 2011 (NIST Special Publication, Vol. 500–296). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 9. http://trec.nist.gov/pubs/trec20/papers/WEB.OVERVIEW.pdf
  15. Overview of the TREC 2012 Web Track.. In Proceedings of TREC 2012 (NIST Special Publication, Vol. 500–298). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 8. http://trec.nist.gov/pubs/trec21/papers/WEB12.overview.pdf
  16. The TREC 2005 Terabyte Track.. In Proceedings of TREC 2005 (NIST Special Publication, Vol. 500–272). National Institute of Standards and Technology, Gaithersburg, Maryland, USA,, 11. http://trec.nist.gov/pubs/trec14/papers/TERABYTE.OVERVIEW.pdf
  17. Cyril W. Cleverdon. 1991. The Significance of the Cranfield Tests on Index Languages.. In Proceedings of SIGIR 1991. Association for Computing Machinery, Chicago, Illinois, USA, 3–12. https://doi.org/10.1145/122860.122861
  18. TREC 2013 Web Track Overview. In Proceedings of TREC 2013 (NIST Special Publication, Vol. 500-302).
  19. TREC 2014 Web Track Overview. In Proceedings of TREC 2014 (NIST Special Publication, Vol. 500-308).
  20. Nick Craswell and David Hawking. 2002. Overview of the TREC-2002 Web Track.. In Proceedings of TREC 2002 (NIST Special Publication, Vol. 500–251). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 10. http://trec.nist.gov/pubs/trec11/papers/WEB.OVER.pdf
  21. Nick Craswell and David Hawking. 2004. Overview of the TREC 2004 Web Track.. In Proceedings of TREC 2004 (NIST Special Publication, Vol. 500–261). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 9. http://trec.nist.gov/pubs/trec13/papers/WEB.OVERVIEW.pdf
  22. Overview of the TREC 2003 Web Track.. In Proceedings of TREC 2003 (NIST Special Publication, Vol. 500–255). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 78–92. http://trec.nist.gov/pubs/trec12/papers/WEB.OVERVIEW.pdf
  23. Overview of the TREC 2020 Deep Learning Track. In Proceedings of TREC 2020 (NIST Special Publication, Vol. 1266). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 13. https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.DL.pdf
  24. Overview of the TREC 2019 Deep Learning Track. In Proceedings of TREC 2019 (NIST Special Publication, Vol. 500–331). National Institute of Standards and Technology, Gaithersburg, Maryland, USA, 22. https://trec.nist.gov/pubs/trec28/papers/OVERVIEW.DL.pdf
  25. Tri Dao. 2023. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. arXiv. https://doi.org/10.48550/arXiv.2307.08691
  26. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. In Proceedings of NeurIPS 2022. 16344–16359.
  27. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of NAACL-HLT 2019. 4171–4186. https://doi.org/10.18653/v1/N19-1423
  28. Perspectives on Large Language Models for Relevance Judgment. In Proceedings of the 2023 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR 2023), Masaharu Yoshioka, Julia Kiseleva, and Mohammad Aliannejadi (Eds.). ACM, Taipei, Taiwan, 39–50. https://dl.acm.org/doi/10.1145/3578337.3605136
  29. William Falcon and The PyTorch Lightning team. 2023. PyTorch Lightning. https://doi.org/10.5281/zenodo.7859091
  30. The Information Retrieval Experiment Platform. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Taipei Taiwan, 2826–2836. https://doi.org/10.1145/3539618.3591888
  31. Rethink Training of BERT Rerankers in Multi-stage Retrieval Pipeline. In Proceedings of ECIR 2021. 280–286. https://doi.org/10.1007/978-3-030-72240-1_26
  32. Sparse Pairwise Re-ranking with Pre-trained Transformers. In Proceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval (ICTIR ’22). New York, NY, USA, 72–80. https://doi.org/10.1145/3539813.3545140
  33. ANTIQUE: A Non-factoid Question Answering Benchmark. In Proceedings of ECIR 2020. 166–173.
  34. TREC 2004 Genomics Track Overview. In Proceedings of TREC 2004 (NIST Special Publication, Vol. 500-261).
  35. TREC 2005 Genomics Track Overview. In Proceedings of TREC 2005 (NIST Special Publication, Vol. 500-266).
  36. Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation. https://doi.org/10.48550/arXiv.2010.02666 arXiv:2010.02666
  37. FiE: Building a Global Probability Space by Leveraging Early Fusion in Encoder for Open-Domain Question Answering. In Proceedings of EMNLP 2022. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 4246–4260. https://doi.org/10.18653/v1/2022.emnlp-main.285
  38. Set Transformer: A Framework for Attention-based Permutation-Invariant Neural Networks. In Proceedings of the 36th International Conference on Machine Learning. PMLR, 3744–3753.
  39. xFormers: A modular and hackable Transformer modelling library. https://github.com/facebookresearch/xformers.
  40. Pretrained Transformers for Text Ranking: BERT and Beyond. Springer International Publishing. https://doi.org/10.1007/978-3-031-02181-7
  41. Ilya Loshchilov and Frank Hutter. 2019. Decoupled Weight Decay Regularization. In Proceedings of ICLR 2019.
  42. Pre-Training for Ad-hoc Retrieval: Hyperlink Is Also You Need. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management. ACM, Virtual Event Queensland Australia, 1212–1221. https://doi.org/10.1145/3459637.3482286
  43. Sean MacAvaney and Luca Soldaini. 2023. One-Shot Labeling for Automatic Relevance Estimation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, July 23-27, 2023, Hsin-Hsi Chen, Wei-Jou (Edward) Duh, Hen-Hsen Huang, Makoto P. Kato, Josiane Mothe, and Barbara Poblete (Eds.). ACM, New York, 2230–2235. https://doi.org/10.1145/3539618.3592032
  44. Adaptive Re-Ranking with a Corpus Graph. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, October 17-21, 2022, Mohammad Al Hasan and Li Xiong (Eds.). ACM, 1491–1500. https://doi.org/10.1145/3511808.3557231
  45. CEDR: Contextualized Embeddings for Document Ranking. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’19). New York, NY, USA, 1101–1104. https://doi.org/10.1145/3331184.3331317
  46. Tom Minka and Stephen Robertson. 2008. Selection bias in the LETOR datasets. In Proceedings of SIGIR 2008 workshop on learning to rank for information retrieval, Vol. 2.
  47. A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models. In Proceedings of SIGIR 2021. Association for Computing Machinery, New York, NY, USA, 2081–2085. https://doi.org/10.1145/3404835.3463093
  48. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Proceedings of COCO@NeurIPS 2016.
  49. Rodrigo Nogueira and Kyunghyun Cho. 2020. Passage Re-ranking with BERT. https://doi.org/10.48550/arXiv.1901.04085 arXiv:1901.04085
  50. Document Ranking with a Pretrained Sequence-to-Sequence Model. In Findings of the Association for Computational Linguistics: EMNLP 2020. Online, 708–718. https://doi.org/10.18653/v1/2020.findings-emnlp.63
  51. Multi-Stage Document Ranking with BERT. https://doi.org/10.48550/arXiv.1910.14424 arXiv:1910.14424
  52. SetRank: Learning a Permutation-Invariant Ranking Model for Information Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, Virtual Event China, 499–508. https://doi.org/10.1145/3397271.3401104
  53. Permutation Equivariant Document Interaction Network for Neural Learning to Rank. In Proceedings of ICTIR 2020. 145–148. https://doi.org/10.1145/3409256.3409819
  54. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proceedings of NeurIPS 2019, Vol. 32.
  55. Context-Aware Learning to Rank with Self-Attention. In Proceedings of ACM SIGIR Workshop on eCommerce (SIGIR eCom’20). https://sigir-ecom.github.io/ecom2020/ecom20Papers/paper18.pdf
  56. Squeezing Water from a Stone: A Bag of Tricks for Further Improving Cross-Encoder Effectiveness for Reranking.. In Proceedings of ECIR 2022. 655–670. https://doi.org/10.1007/978-3-030-99736-6_44
  57. The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models. arXiv:2101.05667 http://arxiv.org/abs/2101.05667
  58. RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2309.15088
  59. RankZephyr: Effective and Robust Zero-Shot Listwise Reranking Is a Breeze! arXiv. https://doi.org/10.48550/arXiv.2312.02724
  60. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. Journal of Machine Learning Research 21 (2020), 140:1–140:67.
  61. Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. In Proceedings of EMNLP-IJCNLP 2019. 3980–3990. https://doi.org/10.18653/v1/D19-1410
  62. Overview of the TREC 2018 Precision Medicine Track. In Proceedings of TREC 2018 (NIST Special Publication, Vol. 500-331).
  63. Overview of the TREC 2017 Precision Medicine Track. In Proceedings of TREC 2017 (NIST Special Publication, Vol. 500-324).
  64. Okapi at TREC-3. In Proceedings of The Third Text REtrieval Conference, TREC 1994, Gaithersburg, Maryland, USA, November 2-4, 1994 (NIST Special Publication, Vol. 500–225). 109–126. http://trec.nist.gov/pubs/trec3/papers/city.ps.gz
  65. In Defense of Cross-Encoders for Zero-Shot Retrieval. https://doi.org/10.48550/arXiv.2212.06121 arXiv:2212.06121
  66. ColBERTv2: Effective and Efficient Retrieval via Lightweight Late Interaction. https://doi.org/10.48550/arXiv.2112.01488 arXiv:2112.01488
  67. Reduce, Reuse, Recycle: Green Information Retrieval Research. In Proceedings of SIGIR 2022. 2825–2837. https://doi.org/10.1145/3477495.3531766
  68. Investigating the Effects of Sparse Attention on Cross-Encoders. arXiv. https://doi.org/10.48550/arXiv.2312.17649
  69. Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agents. In Proceedings of EMNLP 2023. Association for Computational Linguistics, 14918–14937. https://doi.org/10.18653/v1/2023.emnlp-main.923
  70. TABLE: A Task-Adaptive BERT-based ListwisE Ranking Model for Document Retrieval. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (CIKM ’20). New York, NY, USA, 2233–2236. https://doi.org/10.1145/3340531.3412071
  71. Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models. arXiv. https://doi.org/10.48550/arXiv.2312.16098
  72. Found in the Middle: Permutation Self-Consistency Improves Listwise Ranking in Large Language Models. arXiv. https://doi.org/10.48550/arXiv.2310.07712
  73. Attention Is All You Need. In Advances in Neural Information Processing Systems, Vol. 30. Long Beach, CA, 5998–6008. https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
  74. Ellen Voorhees. 2004. Overview of the TREC 2004 Robust Retrieval Track. In TREC.
  75. Ellen M. Voorhees. 1996. NIST TREC Disks 4 and 5: Retrieval Test Collections Document Set.
  76. TREC-COVID: constructing a pandemic information retrieval test collection. SIGIR Forum 54, 1 (2020), 1:1–1:12.
  77. Ellen M. Voorhees and Donna Harman. 1998. Overview of the Seventh Text Retrieval Conference (TREC-7). In TREC.
  78. Ellen M. Voorhees and Donna Harman. 1999. Overview of the Eight Text Retrieval Conference (TREC-8). In TREC.
  79. CORD-19: The Covid-19 Open Research Dataset. arXiv. https://doi.org/10.48550/arXiv.2004.10706
  80. HuggingFace’s Transformers: State-of-the-art Natural Language Processing. arXiv. https://doi.org/10.48550/arXiv.1910.03771
  81. Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval.. In Proceedings of ICLR 2021. https://openreview.net/forum?id=zeFrfgyZln
  82. LinkBERT: Pretraining Language Models with Document Links. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 8003–8016. https://doi.org/10.18653/v1/2022.acl-long.551
  83. Ranking Relevance in Yahoo Search. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16). Association for Computing Machinery, New York, NY, USA, 323–332. https://doi.org/10.1145/2939672.2939677
  84. RankT5: Fine-Tuning T5 for Text Ranking with Ranking Losses. https://doi.org/10.48550/arXiv.2210.10634 arXiv:2210.10634
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (9)
  1. Ferdinand Schlatt (5 papers)
  2. Maik Fröbe (20 papers)
  3. Harrisen Scells (22 papers)
  4. Shengyao Zhuang (42 papers)
  5. Bevan Koopman (37 papers)
  6. Guido Zuccon (73 papers)
  7. Benno Stein (44 papers)
  8. Martin Potthast (64 papers)
  9. Matthias Hagen (33 papers)
Citations (1)

Summary

Set-Encoder: Enhancing Cross-Encoder Performance with Permutation-Invariant Inter-Passage Attention

Introduction

The Set-Encoder introduces a novel cross-encoder architecture designed to address inefficiencies in handling multiple input permutations and high memory usage encountered with existing cross-encoders during the passage re-ranking process. By parallel processing of passages and employing inter-passage attention that ensures permutation invariance, the Set-Encoder demonstrates improved effectiveness over conventional models while maintaining similar parameter counts. This model's architecture is uniquely positioned to utilize more passages in its computations, directly impacting its practical applicability and theoretical advancement in cross-encoder methodologies.

Permutation Invariance and Passage Interactions

Central to the Set-Encoder's design is its treatment of permutation invariance and inter-passage interactions. Traditional cross-encoders lose efficiency due to their sensitive to input permutation ordering, often requiring the re-ranking of multiple input permutations to optimize output. The Set-Encoder diverges from this approach by applying inter-passage attention, processing passages in parallel, and avoiding the concatenation technique that induces a dependency on the order of input passages. This methodical divergence is not only computationally more efficient but also stabilizes the model's output against variations in passage ordering, thereby upholding a desirable property of learning-to-rank models: permutation invariance.

Implementation and Fine-tuning Approach

The Set-Encoder leverages fused-attention kernels to address the memory inefficiency challenge, enabling the model to fine-tune with a significantly higher number of passages. By revisiting the model's fine-tuning strategy, particularly noting the influence of training set quality on cross-encoder effectiveness, the paper presents a two-stage fine-tuning process. Initially, the model is fine-tuned on a large dataset with potential noise, followed by a refinement stage utilizing higher-quality distillation data. This process notably leads to improvements in effectiveness, underscoring the importance of quality training data and strategic fine-tuning in developing performant models.

Evaluation and Findings

Through extensive evaluation, including experiments on TREC Deep Learning and TIREx platforms, the Set-Encoder showcases outstanding performance. Its innovation in handling passage permutations invariantly without compromising the interactions between passages underlines a significant advancement in cross-encoder designs. More importantly, the model demonstrates superior efficiency and either matches or exceeds the performance of larger, more complex models. This result corroborates the effectiveness of its architecture and the potential of inter-passage attention in improving the model's learning and prediction capabilities.

Implications and Future Directions

The Set-Encoder model represents a stride in optimizing cross-encoder architectures for passage re-ranking, addressing persistent challenges of permutation variability and computational inefficiency. In addition to its immediate benefits in performant model training and execution, the model's architecture invites further exploration into encoder size scaling, integration with sparse models for efficiency gains, and the development of sophisticated loss functions leveraging LLM distillation. Furthermore, the detached dependency on positional information during the re-ranking process suggests a resilience of the Set-Encoder to changes in first-stage retrieval performances, marking an avenue for future studies on the robustness of ranking models.

Conclusion

In summary, the Set-Encoder advances the state of cross-encoder architectures for passage re-ranking through its innovative handling of permutation invariance and passage interactions. Its design not only demonstrates improved performance and efficiency but also sets the stage for future explorations into more sophisticated and efficient ranking models.