Enhancing Documents with Multidimensional Relevance Statements in Cross-encoder Re-ranking (2306.10979v1)
Abstract: In this paper, we propose a novel approach to consider multiple dimensions of relevance beyond topicality in cross-encoder re-ranking. On the one hand, current multidimensional retrieval models often use na\"ive solutions at the re-ranking stage to aggregate multiple relevance scores into an overall one. On the other hand, cross-encoder re-rankers are effective in considering topicality but are not designed to straightforwardly account for other relevance dimensions. To overcome these issues, we envisage enhancing the candidate documents -- which are retrieved by a first-stage lexical retrieval model -- with "relevance statements" related to additional dimensions of relevance and then performing a re-ranking on them with cross-encoders. In particular, here we consider an additional relevance dimension beyond topicality, which is credibility. We test the effectiveness of our solution in the context of the Consumer Health Search task, considering publicly available datasets. Our results show that the proposed approach statistically outperforms both aggregation-based and cross-encoder re-rankers.
- UWaterlooMDS at the TREC 2021 Health Misinformation Track. In Proceedings of the Thirtieth REtrieval Conference Proceedings (TREC 2021). National Institute of Standards and Technology (NIST), Special Publication, 1–18.
- Moustafa Al-Hajj and Mustafa Jarrar. 2022. Arabglossbert: Fine-tuning bert on context-gloss pairs for wsd. arXiv preprint arXiv:2205.09685 (2022).
- Serverless BM25 Search and BERT Reranking.. In DESIRES. 3–9.
- Injecting the BM25 Score as Text Improves BERT-Based Re-rankers. arXiv preprint arXiv:2301.09728 (2023).
- Webis at TREC 2021: Deep Learning, Health Misinformation, and Podcasts Tracks. In The Thirtieth REtrieval Conference Proceedings (TREC 2021). 500–335.
- MarkedBERT: Integrating Traditional IR Cues in Pre-Trained Language Models for Passage Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (Virtual Event, China) (SIGIR ’20). Association for Computing Machinery, New York, NY, USA, 1977–1980. https://doi.org/10.1145/3397271.3401194
- Highlighting exact matching via marking strategies for ad hoc document ranking with pretrained contextualized language models. Information Retrieval Journal 25, 4 (Dec. 2022), 414–460. https://doi.org/10.1007/s10791-022-09414-x
- Overview of the TREC 2020 Health Misinformation Track. https://trec.nist.gov/pubs/trec29/papers/OVERVIEW.HM.pdf
- Multidimensional relevance: A new aggregation criterion. In Advances in Information Retrieval: 31th European Conference on IR Research, ECIR 2009, Toulouse, France, April 6-9, 2009. Proceedings 31. Springer, 264–275.
- A prioritized “and” aggregation operator for multidimensional relevance assessment. In AI* IA 2009: Emergent Perspectives in Artificial Intelligence: XIth International Conference of the Italian Association for Artificial Intelligence Reggio Emilia, Italy, December 9-12, 2009 Proceedings 11. Springer, 72–81.
- Multidimensional relevance: Prioritized aggregation in a personalized Information Retrieval setting. Information processing & management 48, 2 (2012), 340–357.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. https://doi.org/10.18653/v1/N19-1423
- CiTIUS at the TREC 2020 Health Misinformation Track.. In TREC.
- Edward A fox. 1993. Combination of Multiple Searches. In Proceedings of the Second Text Retrieval Conference, Aug./Sep. 1993.
- Complement Lexical Retrieval Model with Semantic Residual Embeddings. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part I. Springer-Verlag, Berlin, Heidelberg, 146–160. https://doi.org/10.1007/978-3-030-72113-8_10
- CLEF eHealth Evaluation Lab 2021. In Advances in Information Retrieval, Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, and Fabrizio Sebastiani (Eds.). Springer International Publishing, Cham, 593–600.
- Overview of the CLEF eHealth Evaluation Lab 2020. In Experimental IR Meets Multilinguality, Multimodality, and Interaction, Avi Arampatzis, Evangelos Kanoulas, Theodora Tsikrika, Stefanos Vrochidis, Hideo Joho, Christina Lioma, Carsten Eickhoff, Aurélie Névéol, Linda Cappellato, and Nicola Ferro (Eds.). Springer International Publishing, Cham, 255–271.
- Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants. In Advances in Information Retrieval, Joemon M. Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro, Mário J. Silva, and Flávio Martins (Eds.). Springer International Publishing, Cham, 28–34.
- BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.
- MarkBERT: Marking Word Boundaries Improves Chinese BERT. arXiv preprint arXiv:2203.06378 (2022).
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
- PyTerrier: Declarative Experimentation in Python from BM25 to Dense Retrieval. In Proceedings of the 30th ACM International Conference on Information Knowledge Management (Virtual Event, Queensland, Australia) (CIKM ’21). Association for Computing Machinery, New York, NY, USA, 4526–4533. https://doi.org/10.1145/3459637.3482013
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. Curran Associates Inc., Red Hook, NY, USA, 12.
- H2oloo at trec 2020: When all you got is a hammer… deep learning, health misinformation, and precision medicine. Corpus 5, d3 (2020), d2.
- Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. https://arxiv.org/abs/1908.10084
- The probabilistic relevance framework: BM25 and beyond. Foundations and Trends® in Information Retrieval 3, 4 (2009), 333–389.
- S. E. Robertson and S. Walker. 1994. Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (Dublin, Ireland) (SIGIR ’94). Springer-Verlag, Berlin, Heidelberg, 232–241.
- Upv at trec health misinformation track 2021 ranking with sbert and quality estimators. arXiv preprint arXiv:2112.06080 (2021).
- An Unsupervised Approach to Genuine Health Information Retrieval Based on Scientific Evidence. In Web Information Systems Engineering – WISE 2022: 23rd International Conference, Biarritz, France, November 1–3, 2022, Proceedings (Biarritz, France). Springer-Verlag, Berlin, Heidelberg, 119–135. https://doi.org/10.1007/978-3-031-20891-1_10
- Marc Van Opijnen and Cristiana Santos. 2017. On the concept of relevance in legal information retrieval. Artificial Intelligence and Law 25 (2017), 65–87.
- Marco Viviani and Gabriella Pasi. 2017. Credibility in social media: opinions, news, and health information—a survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery 7, 5 (2017), e1209.
- Do NLP models know numbers? probing numeracy in embeddings. arXiv preprint arXiv:1909.07940 (2019).
- Eric W Weisstein. 2004. Bonferroni correction. https://mathworld. wolfram. com/ (2004).
- Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019).
- DS4DH at TREC Health Misinformation 2021: Multi-Dimensional Ranking Models with Transfer Learning and Rank Fusion. arXiv preprint arXiv:2202.06771 (2022).
- Rishabh Upadhyay (6 papers)
- Arian Askari (19 papers)
- Gabriella Pasi (25 papers)
- Marco Viviani (7 papers)