Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Information Retrieval with Entity Linking (2404.08678v1)

Published 7 Apr 2024 in cs.IR

Abstract: Despite the advantages of their low-resource settings, traditional sparse retrievers depend on exact matching approaches between high-dimensional bag-of-words (BoW) representations of both the queries and the collection. As a result, retrieval performance is restricted by semantic discrepancies and vocabulary gaps. On the other hand, transformer-based dense retrievers introduce significant improvements in information retrieval tasks by exploiting low-dimensional contextualized representations of the corpus. While dense retrievers are known for their relative effectiveness, they suffer from lower efficiency and lack of generalization issues, when compared to sparse retrievers. For a lightweight retrieval task, high computational resources and time consumption are major barriers encouraging the renunciation of dense models despite potential gains. In this work, I propose boosting the performance of sparse retrievers by expanding both the queries and the documents with linked entities in two formats for the entity names: 1) explicit and 2) hashed. A zero-shot end-to-end dense entity linking system is employed for entity recognition and disambiguation to augment the corpus. By leveraging the advanced entity linking methods, I believe that the effectiveness gap between sparse and dense retrievers can be narrowed. Experiments are conducted on the MS MARCO passage dataset using the original qrel set, the re-ranked qrels favoured by MonoT5 and the latter set further re-ranked by DuoT5. Since I am concerned with the early stage retrieval in cascaded ranking architectures of large information retrieval systems, the results are evaluated using recall@1000. The suggested approach is also capable of retrieving documents for query subsets judged to be particularly difficult in prior work.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (296)
  1. Integrating rule-based system with classification for arabic named entity recognition. In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text Processing - Volume Part I, CICLing’12, page 311–322, Berlin, Heidelberg, 2012. Springer-Verlag.
  2. Robust multilingual named entity recognition with shallow semi-supervised features. Artificial Intelligence, 238:63–82, sep 2016.
  3. Document expansion based on wordnet for robust ir. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, COLING ’10, page 9–17, USA, 2010. Association for Computational Linguistics.
  4. Learning unsupervised knowledge-enhanced representations to reduce the semantic gap in information retrieval. ACM Transactions on Information Systems (TOIS), 38:1 – 48, 2020.
  5. A multi-task approach for named entity recognition in social media data. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 148–153, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
  6. A reinforcement learning-driven translation model for search-oriented conversational systems. In SCAI@EMNLP, 2018.
  7. Polyglot: Distributed word representations for multilingual NLP. In Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pages 183–192, Sofia, Bulgaria, August 2013. Association for Computational Linguistics.
  8. A semi-supervised learning approach to arabic named entity recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, September 2013.
  9. SRA: Description of the IE2 system used for MUC-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998, 1998.
  10. MS MARCO Chameleons: Challenging the MS MARCO Leaderboard with Extremely Obstinate Queries, page 4426–4435. Association for Computing Machinery, New York, NY, USA, 2021.
  11. Predicting efficiency/effectiveness trade-offs for dense vs. sparse retrieval strategy selection. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM ’21, page 2862–2866, New York, NY, USA, 2021. Association for Computing Machinery.
  12. A comparison of named entity recognition tools applied to biographical texts. 2nd International Conference on Systems and Computer Science, Aug 2013.
  13. Latent semantic indexing (lsi) fails for trec collections. SIGKDD Explor., 12:5–10, 2011.
  14. Dbpedia: A nucleus for a web of open data. In Karl Aberer, Key-Sun Choi, Natasha Noy, Dean Allemang, Kyung-Il Lee, Lyndon Nixon, Jennifer Golbeck, Peter Mika, Diana Maynard, Riichiro Mizoguchi, Guus Schreiber, and Philippe Cudré-Mauroux, editors, The Semantic Web, pages 722–735, Berlin, Heidelberg, 2007. Springer Berlin Heidelberg.
  15. Dr. Hiteshwar Kumar Azad and Akshay Deepak. Query expansion techniques for information retrieval: a survey. Inf. Process. Manag., 56:1698–1735, 2019.
  16. Using query contexts in information retrieval. In Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’07, page 15–22, New York, NY, USA, 2007. Association for Computing Machinery.
  17. Sparterm: Learning term-based sparse representation for fast text retrieval, 2020.
  18. Ms marco: A human generated machine reading comprehension dataset, 2018.
  19. Nayan Banik and Md. Hasan Hafizur Rahman. Gru based named entity recognition system for bangla online newspapers. In 2018 International Conference on Innovation in Engineering and Technology (ICIET), pages 1–6, 2018.
  20. Information retrieval as statistical translation. SIGIR Forum, 51(2):219–226, aug 2017.
  21. Nymble: a high-performance learning name-finder. In Fifth Conference on Applied Natural Language Processing, pages 194–201, Washington, DC, USA, March 1997. Association for Computational Linguistics.
  22. An algorithm that learns what’s in a name. Machine Learning, 34:211–231, 2004.
  23. Natural Language Processing with Python. O’Reilly Media, Inc., 1st edition, 2009.
  24. Latent dirichlet allocation. J. Mach. Learn. Res., 3(null):993–1022, mar 2003.
  25. Enriching word vectors with subword information, 2016.
  26. Freebase: A collaboratively created graph database for structuring human knowledge. SIGMOD ’08, page 1247–1250, New York, NY, USA, 2008. Association for Computing Machinery.
  27. mmarco: A multilingual version of the ms marco passage ranking dataset, 2021.
  28. Efficient query evaluation using a two-level retrieval process. In Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM ’03, page 426–434, New York, NY, USA, 2003. Association for Computing Machinery.
  29. Signature verification using a ”siamese” time delay neural network. In Proceedings of the 6th International Conference on Neural Information Processing Systems, NIPS’93, page 737–744, San Francisco, CA, USA, 1993. Morgan Kaufmann Publishers Inc.
  30. Samuel Broscheit. Investigating entity knowledge in bert with simple neural end-to-end entity linking. ArXiv, abs/2003.05473, 2019.
  31. Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press, 2010.
  32. Semantic models for the first-stage retrieval: A comprehensive review. ArXiv, abs/2103.04831, 2021.
  33. Semantic models for the first-stage retrieval: A comprehensive review. ACM Transactions on Information Systems (TOIS), 40:1 – 42, 2022.
  34. Selecting good expansion terms for pseudo-relevance feedback. In SIGIR ’08, 2008.
  35. Coupled semi-supervised learning for information extraction. In Proceedings of the Third ACM International Conference on Web Search and Data Mining, WSDM ’10, page 101–110, New York, NY, USA, 2010. Association for Computing Machinery.
  36. Bob Carpenter. Phrasal queries with lingpipe and lucene: Ad hoc genomics text retrieval. In TREC, 2004.
  37. A survey of automatic query expansion in information retrieval. ACM Comput. Surv., 44(1), jan 2012.
  38. Rose-ner: Robust semi-supervised named entity recognition on insufficient labeled data. IJCKG’21, page 38–44, New York, NY, USA, 2021. Association for Computing Machinery.
  39. Behavior sequence transformer for e-commerce recommendation in alibaba. In Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data, DLP-KDD ’19, New York, NY, USA, 2019. Association for Computing Machinery.
  40. Jason P. C. Chiu and Eric Nichols. Named entity recognition with bidirectional lstm-cnns, 2015.
  41. Aggregating continuous word embeddings for information retrieval. In CVSM@ACL, 2013.
  42. Introduction to the bio-entity recognition task at JNLPBA. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP), pages 73–78, Geneva, Switzerland, August 28th and 29th 2004. COLING.
  43. Unsupervised models for named entity classification. In EMNLP, 1999.
  44. Kevyn Collins-Thompson. Reducing the risk of query expansion via robust constrained optimization. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM ’09, page 837–846, New York, NY, USA, 2009. Association for Computing Machinery.
  45. Monte Cook. Singular terms and rigid designators. The Southwestern Journal of Philosophy, 10(1):157–162, 1979.
  46. Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, page 758–759, New York, NY, USA, 2009. Association for Computing Machinery.
  47. Low-resource named entity recognition with cross-lingual, character-level neural conditional random fields. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 91–96, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing.
  48. “is this document relevant?…probably”: A survey of probabilistic models in information retrieval. ACM Comput. Surv., 30(4):528–552, dec 1998.
  49. Query expansion based on term distribution and dbpedia features. Expert Syst. Appl., 176:114909, 2021.
  50. Context-aware sentence/passage term importance estimation for first stage retrieval. ArXiv, abs/1910.10687, 2019.
  51. Context-aware document term weighting for ad-hoc search. Proceedings of The Web Conference 2020, 2020.
  52. Context-aware term weighting for first stage passage retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  53. Entity query feature expansion using knowledge base links. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR ’14, page 365–374, New York, NY, USA, 2014. Association for Computing Machinery.
  54. Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci., 41:391–407, 1990.
  55. Neuroner: an easy-to-use program for named-entity recognition based on neural networks, 2017.
  56. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics.
  57. Sean R. Eddy. Hidden markov models. Current opinion in structural biology, 6 3:361–5, 1996.
  58. Improving retrieval of short texts through document expansion. In Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’12, page 911–920, New York, NY, USA, 2012. Association for Computing Machinery.
  59. Concept-based information retrieval using explicit semantic analysis. ACM Trans. Inf. Syst., 29(2), apr 2011.
  60. Towards a definition of knowledge graphs. In SEMANTiCS, 2016.
  61. Relevance-based entity selection for ad hoc retrieval. Inf. Process. Manag., 56:1645–1666, 2019.
  62. Document retrieval model through semantic linking. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, WSDM ’17, page 181–190, New York, NY, USA, 2017. Association for Computing Machinery.
  63. An empirical study of embedding features in learning to rank. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017.
  64. Ad hoc retrieval via entity linking and semantic similarity. Knowledge and Information Systems, 58:551–583, 2018.
  65. Web-scale information extraction in knowitall: (preliminary results). In Proceedings of the 13th International Conference on World Wide Web, WWW ’04, page 100–110, New York, NY, USA, 2004. Association for Computing Machinery.
  66. Unsupervised named-entity extraction from the web: An experimental study. Artif. Intell., 165(1):91–134, jun 2005.
  67. Richard J. Evans. A framework for named entity recognition in the open domain. In RANLP, 2003.
  68. Linked data quality of dbpedia, freebase, opencyc, wikidata, and yago. Semantic Web, 9:1–53, 03 2017.
  69. A survey of information retrieval and filtering methods. Technical report, USA, 1995.
  70. Pre-training methods in information retrieval. CoRR, abs/2111.13853, 2021.
  71. Rule-based named entity recognition for greek financial texts. 2000.
  72. Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). Proceedings of the 19th ACM international conference on Information and knowledge management, 2010.
  73. Learning Term Discrimination, page 1993–1996. Association for Computing Machinery, New York, NY, USA, 2020.
  74. A decision-theoretic generalization of on-line learning and an application to boosting. In Paul Vitányi, editor, Computational Learning Theory, pages 23–37, Berlin, Heidelberg, 1995. Springer Berlin Heidelberg.
  75. Norbert Fuhr. Models for retrieval with probabilistic indexing. Information Processing & Management, 25(1):55–72, 1989. Special Issue: Modeling Data, Information and Knowledge.
  76. Norbert Fuhr. Probabilistic Models in Information Retrieval. The Computer Journal, 35(3):243–255, 06 1992.
  77. Vocabulary mismatch avoidance techniques. International Journal of Scientific and Technology Research, 9(4):2585 – 2594, 2020.
  78. Joint optimization of cascade ranking models. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM ’19, page 15–23, New York, NY, USA, 2019. Association for Computing Machinery.
  79. Word embedding based generalized language model for information retrieval. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.
  80. Clickthrough-based translation models for web search: from word models to phrase models. Proceedings of the 19th ACM international conference on Information and knowledge management, 2010.
  81. Towards concept-based translation models using search logs for query expansion. Proceedings of the 21st ACM international conference on Information and knowledge management, 2012.
  82. Dependence Language Model for Information Retrieval, page 170–177. SIGIR ’04. Association for Computing Machinery, New York, NY, USA, 2004.
  83. Coil: Revisit exact lexical match in information retrieval with contextualized inverted list. In NAACL, 2021.
  84. Complementing lexical retrieval with semantic residual embedding. CoRR, abs/2004.13969, 2020.
  85. Allennlp: A deep semantic natural language processing platform, 2018.
  86. Pseudo descriptions for meta-data retrieval. Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, 2018.
  87. Improving ad hoc retrieval with bag of entities. In TREC, 2018.
  88. Ed Greengrass. Information retrieval: A survey. 2000.
  89. Arabic named entity recognition : A bidirectional gru-crf approach. 04 2017.
  90. Entity linking via joint encoding of types, descriptions, and context. In EMNLP, 2017.
  91. Neural vector spaces for unsupervised information retrieval. ACM Transactions on Information Systems (TOIS), 36:1 – 25, 2018.
  92. Marti A. Hearst. Support vector machines. IEEE Intelligent Systems, 13(4):18–28, jul 1998.
  93. A dictionary to identify small molecules and drugs in free text. Bioinformatics, 25 22:2983–91, 2009.
  94. Thomas Hofmann. Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’99, page 50–57, New York, NY, USA, 1999. Association for Computing Machinery.
  95. Thomas Hofmann. Probabilistic latent semantic analysis, 2013.
  96. Local self-attention over long text for efficient document retrieval. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  97. Interpretable & time-budget-constrained contextualization for re-ranking. In Giuseppe De Giacomo, Alejandro Catalá, Bistra Dilkina, Michela Milano, Senén Barro, Alberto Bugarín, and Jérôme Lang, editors, ECAI 2020 - 24th European Conference on Artificial Intelligence, 29 August-8 September 2020, Santiago de Compostela, Spain, August 29 - September 8, 2020 - Including 10th Conference on Prestigious Applications of Artificial Intelligence (PAIS 2020), volume 325 of Frontiers in Artificial Intelligence and Applications, pages 513–520. IOS Press, 2020.
  98. Clickage: towards bridging semantic and intent gaps via mining click logs of search engines. Proceedings of the 21st ACM international conference on Multimedia, 2013.
  99. Leveraging deep neural networks and knowledge graphs for entity disambiguation. ArXiv, abs/1504.07678, 2015.
  100. Bidirectional lstm-crf models for sequence tagging. ArXiv, abs/1508.01991, 2015.
  101. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In ICLR, 2020.
  102. University of Sheffield: Description of the LaSIE-II system as used for MUC-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998, 1998.
  103. Uhd-bert: Bucketed ultra-high dimensional sparse representations for full ranking. ArXiv, abs/2104.07198, 2021.
  104. Unsupervised document expansion for information retrieval with stochastic text generation. CoRR, abs/2105.00666, 2021.
  105. A survey on knowledge graphs: Representation, acquisition, and applications. IEEE Transactions on Neural Networks and Learning Systems, page 1–21, 2021.
  106. Ranking related news predictions. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, page 755–764, New York, NY, USA, 2011. Association for Computing Machinery.
  107. Jagat Narain Kapur. Maximum-entropy models in science and engineering. 1989.
  108. Evaluation of manual query expansion rules on a domain specific faq collection. In Proceedings of the 6th International Conference on Experimental IR Meets Multilinguality, Multimodality, and Interaction - Volume 9283, CLEF’15, page 248–253, Berlin, Heidelberg, 2015. Springer-Verlag.
  109. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, 2010.
  110. Estimation of statistical translation models based on mutual information for ad hoc information retrieval. In Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’10, page 323–330, New York, NY, USA, 2010. Association for Computing Machinery.
  111. Axiomatic analysis of translation language model for information retrieval. In Proceedings of the 34th European Conference on Advances in Information Retrieval, ECIR’12, page 268–280, Berlin, Heidelberg, 2012. Springer-Verlag.
  112. Dense passage retrieval for open-domain question answering, 2020.
  113. Query expansion using pseudo relevance feedback on wikipedia. J. Intell. Inf. Syst., 50(3):455–478, jun 2018.
  114. Cogcompnlp: Your swiss army knife for nlp. In 11th Language Resources and Evaluation Conference, 2018.
  115. Colbert: Efficient and effective passage search via contextualized late interaction over bert. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, page 39–48, New York, NY, USA, 2020. Association for Computing Machinery.
  116. Leveraging semantic resources in diversified query expansion. World Wide Web, 21(4):1041–1067, jun 2017.
  117. Isoquest inc.: Description of the netowl™ extractor system as used for muc-7. In Seventh Message Understanding Conference (MUC-7): Proceedings of a Conference Held in Fairfax, Virginia, April 29 - May 1, 1998, 1998.
  118. Corpus structure, language models, and ad hoc information retrieval. ArXiv, cs.IR/0405044, 2004.
  119. CharNER: Character-level named entity recognition. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 911–921, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee.
  120. Query expansion using word embeddings. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM ’16, page 1929–1932, New York, NY, USA, 2016. Association for Computing Machinery.
  121. Leveraging semantic and lexical matching to improve the recall of document retrieval systems: A hybrid approach. ArXiv, abs/2010.01195, 2020.
  122. Ted Kwartler. The OpenNLP Project, pages 237–269. 05 2017.
  123. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, page 282–289, San Francisco, CA, USA, 2001. Morgan Kaufmann Publishers Inc.
  124. Relevance based language models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’01, page 120–127, New York, NY, USA, 2001. Association for Computing Machinery.
  125. Boosting entity linking performance by leveraging unlabeled documents, 2019.
  126. Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32, ICML’14, page II–1188–II–1196. JMLR.org, 2014.
  127. Algorithms for non-negative matrix factorization. In Proceedings of the 13th International Conference on Neural Information Processing Systems, NIPS’00, page 535–541, Cambridge, MA, USA, 2000. MIT Press.
  128. Contextualized sparse representations for real-time open-domain question answering, 2019.
  129. Contextualized sparse representations for real-time open-domain question answering. In ACL, 2020.
  130. Neural language modeling for named entity recognition. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6937–6941, Barcelona, Spain (Online), December 2020. International Committee on Computational Linguistics.
  131. Efficient one-pass end-to-end entity linking for questions. CoRR, abs/2010.02413, 2020.
  132. Semantic matching in search. Found. Trends Inf. Retr., 7(5):343–469, jun 2014.
  133. A survey on deep learning for named entity recognition. CoRR, abs/1812.09449, 2018.
  134. Leveraging linguistic structures for named entity recognition with bidirectional recursive neural networks. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2664–2669, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.
  135. Svm based learning system for information extraction. In Deterministic and Statistical Methods in Machine Learning, 2004.
  136. A simple semi-supervised algorithm for named entity recognition. In Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing, SemiSupLearn ’09, page 58–65, USA, 2009. Association for Computational Linguistics.
  137. Bidirectional LSTM for named entity recognition in Twitter messages. In Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), pages 145–152, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee.
  138. Phrase clustering for discriminative learning. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, ACL ’09, page 1030–1038, USA, 2009. Association for Computational Linguistics.
  139. Distilling dense representations for ranking using tightly-coupled teachers. CoRR, abs/2010.11386, 2020.
  140. Distilling dense representations for ranking using tightly-coupled teachers, 2020.
  141. In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval. In Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021), pages 163–173, Online, August 2021. Association for Computational Linguistics.
  142. Design challenges for entity linking. Transactions of the Association for Computational Linguistics, 3:315–328, 2015.
  143. Cascade ranking for operational e-commerce search. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Aug 2017.
  144. Hamner: Headword amplified multi-span distantly supervised method for domain specific named entity recognition, 2019.
  145. Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225–331, mar 2009.
  146. Constraining word embeddings by prior knowledge - application to medical information retrieval. In AIRS, 2016.
  147. Cluster-based retrieval using language models. In SIGIR ’04, 2004.
  148. Latent entity space: a novel retrieval approach for entity-bearing queries. Information Retrieval Journal, 18:473–503, 2015.
  149. Another look at information retrieval as statistical translation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’22, page 2749–2754, New York, NY, USA, 2022. Association for Computing Machinery.
  150. Zero-shot entity linking by reading entity descriptions. In ACL, 2019.
  151. Sparse, dense, and attentional representations for text retrieval. Transactions of the Association for Computational Linguistics, 9:329–345, 2021.
  152. Efficient document re-ranking for transformers by precomputing term representations. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  153. Expansion via prediction of importance with contextualization. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  154. Zohair Malki. Comprehensive study and comparison of information retrieval indexing techniques. International Journal of Advanced Computer Science and Applications, 7, 2016.
  155. Learning passage impacts for inverted indexes. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.
  156. Robert Malouf. Markov models for language-independent named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20, COLING-02, page 1–4, USA, 2002. Association for Computational Linguistics.
  157. The stanford corenlp natural language processing toolkit. In ACL, 2014.
  158. Joint learning of named entity recognition and entity linking. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pages 190–196, Florence, Italy, July 2019. Association for Computational Linguistics.
  159. Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL ’03, page 188–191, USA, 2003. Association for Computational Linguistics.
  160. Entity extraction without language-specific resources. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20, COLING-02, page 1–4, USA, 2002. Association for Computational Linguistics.
  161. A markov random field model for term dependencies. In Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’05, page 472–479, New York, NY, USA, 2005. Association for Computing Machinery.
  162. Efficient estimation of word representations in vector space, 2013.
  163. An introduction to neural information retrieval. Found. Trends Inf. Retr., 13:1–126, 2018.
  164. A dual embedding space model for document ranking. ArXiv, abs/1602.01137, 2016.
  165. On the role of information retrieval and information extraction in question answering systems. In SCIE, 2002.
  166. Survey on english entity linking on wikidata. ArXiv, abs/2112.01989, 2021.
  167. Combining word and entity embeddings for entity linking. In The Semantic Web: 14th International Conference, ESWC 2017, Portorož, Slovenia, May 28 – June 1, 2017, Proceedings, Part I, page 337–352, Berlin, Heidelberg, 2017. Springer-Verlag.
  168. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30:3–26, 2007.
  169. Unsupervised named-entity recognition: Generating gazetteers and resolving ambiguity. In Canadian Conference on AI, 2006.
  170. Semantic concept-based query expansion and re-ranking for multimedia retrieval. In Proceedings of the 15th ACM International Conference on Multimedia, MM ’07, page 991–1000, New York, NY, USA, 2007. Association for Computing Machinery.
  171. Scispacy: Fast and robust models for biomedical natural language processing. Proceedings of the 18th BioNLP Workshop and Shared Task, 2019.
  172. Jointly embedding entities and text with distant supervision. In Proceedings of The Third Workshop on Representation Learning for NLP, pages 195–206, Melbourne, Australia, July 2018. Association for Computational Linguistics.
  173. Pseudo-Relevance Feedback for Information Retrieval in Medicine Using Genetic Algorithms, pages 395–404. 01 2018.
  174. Joint learning of local and global features for entity linking via neural networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2310–2320, Osaka, Japan, December 2016. The COLING 2016 Organizing Committee.
  175. Rodrigo Nogueira. From doc2query to doctttttquery. 2019.
  176. Multi-stage document ranking with BERT. CoRR, abs/1910.14424, 2019.
  177. Document expansion by query prediction. CoRR, abs/1904.08375, 2019.
  178. A survey of information retrieval techniques. Advances in Networks, Volume 5, Issue 2, pp. 40-46, November 2017.
  179. Towards holistic entity linking: Survey and directions. Inf. Syst., 95:101624, 2021.
  180. Topic models ensembles for ad-hoc information retrieval. Information, 12(9), 2021.
  181. Cross-lingual name tagging and linking for 282 languages. In ACL, 2017.
  182. Using key concepts in a translation model for retrieval. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, page 927–930, New York, NY, USA, 2015. Association for Computing Machinery.
  183. Heiko Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8:489–508, 2017.
  184. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543, Doha, Qatar, October 2014. Association for Computational Linguistics.
  185. Named entity recognition and relation detection for biomedical information extraction. Frontiers in Cell and Developmental Biology, 8, 2020.
  186. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, page 701–710, New York, NY, USA, 2014. Association for Computing Machinery.
  187. Semi-supervised sequence tagging with bidirectional language models. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1756–1765, Vancouver, Canada, July 2017. Association for Computational Linguistics.
  188. Deep contextualized word representations, 2018.
  189. Knowledge enhanced contextual word representations. In EMNLP, 2019.
  190. A language modeling approach to information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’98, page 275–281, New York, NY, USA, 1998. Association for Computing Machinery.
  191. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. ArXiv, abs/2101.05667, 2021.
  192. Knowledge graph identification. In Harith Alani, Lalana Kagal, Achille Fokoue, Paul Groth, Chris Biemann, Josiane Xavier Parreira, Lora Aroyo, Natasha Noy, Chris Welty, and Krzysztof Janowicz, editors, The Semantic Web – ISWC 2013, pages 542–557, Berlin, Heidelberg, 2013. Springer Berlin Heidelberg.
  193. Concept based query expansion. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’93, page 160–169, New York, NY, USA, 1993. Association for Computing Machinery.
  194. John Ross Quinlan. Induction of decision trees. Mach. Learn., 1(1):81–106, mar 1986.
  195. Improving language understanding by generative pre-training. 2018.
  196. Exploring the limits of transfer learning with a unified text-to-text transformer, 2019.
  197. Juan Enrique Ramos. Using tf-idf to determine word relevance in document queries. 2003.
  198. Document retrieval using entity-based language models. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’16, page 65–74, New York, NY, USA, 2016. Association for Computing Machinery.
  199. Knowledge Graphs: An Information Retrieval Perspective. 2020.
  200. Kashif Riaz. Rule-based named entity recognition in urdu. In Proceedings of the 2010 Named Entities Workshop, NEWS ’10, page 126–135, USA, 2010. Association for Computational Linguistics.
  201. Stefan Riezler and Yi Liu. Query rewriting using monolingual statistical machine translation. Computational Linguistics, 36:569–582, 2010.
  202. Named entity recognition in tweets: An experimental study. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’11, page 1524–1534, USA, 2011. Association for Computational Linguistics.
  203. The probabilistic relevance framework: Bm25 and beyond. Found. Trends Inf. Retr., 3(4):333–389, apr 2009.
  204. Simple, proven approaches to text retrieval. 1994.
  205. Relevance weighting of search terms. J. Am. Soc. Inf. Sci., 27:129–146, 1976.
  206. J. J. Rocchio. Relevance Feedback in Information Retrieval. Prentice Hall, Englewood, Cliffs, New Jersey, 1971.
  207. WBI-NER: The impact of domain-specific features on the performance of identifying and classifying mentions of drugs. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), pages 356–363, Atlanta, Georgia, USA, June 2013. Association for Computational Linguistics.
  208. Fine-grained entity linking. Journal of Web Semantics, 65:100600, 2020.
  209. Review: Information retrieval techniques and applications. 2015.
  210. Arya Roy. Recent trends in named entity recognition (NER). CoRR, abs/2101.11420, 2021.
  211. Ian Ruthven. Interactive information retrieval. Annual Review of Information Science and Technology, 42:43–92, November 2008.
  212. Drug-drug interaction extraction from biomedical text using long short term memory network, 2017.
  213. Arabic rule-based named entity recognition systems progress and challenges. International Journal on Advanced Science, Engineering and Information Technology, 7:815–821, 2017.
  214. Semantic hashing. International Journal of Approximate Reasoning, 50(7):969–978, 2009. Special Section on Graphical Models and Information Retrieval.
  215. A vector space model for automatic indexing. Commun. ACM, 18:613–620, 1975.
  216. Simple entity-centric questions challenge dense retrievers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6138–6148, Online and Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics.
  217. Phrase-indexed question answering: A new challenge for scalable document comprehension. ArXiv, abs/1804.07726, 2018.
  218. Real-time open-domain question answering with dense-sparse phrase index, 2019.
  219. Burr Settles. Biomedical named entity recognition using conditional random fields and rich feature sets. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and Its Applications, JNLPBA ’04, page 104–107, USA, 2004. Association for Computational Linguistics.
  220. Neural entity linking: A survey of models based on deep learning. CoRR, abs/2006.00575, 2020.
  221. Named entity recognition in natural language processing: A systematic review. In Deepak Gupta, Ashish Khanna, Vineet Kansal, Giancarlo Fortino, and Aboul Ella Hassanien, editors, Proceedings of Second Doctoral Symposium on Computational Intelligence, pages 817–828, Singapore, 2022. Springer Singapore.
  222. A survey on information retrieval models, techniques and applications. 2013.
  223. Named entity recognition using neural language model and crf for hindi language. Computer Speech & Language, 74:101356, 2022.
  224. Rahul Sharnagat. Named entity recognition: A literature survey. Center For Indian Language Technology, pages 1–27, 2014.
  225. Early stage retrieval with entity linking. In Proceedings of the 31st ACM International Conference on Information and knowledge Management, CIKM’22, 2022.
  226. Entity linking meets deep learning: Techniques and solutions. CoRR, abs/2109.12520, 2021.
  227. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, 27:443–460, 2015.
  228. Document expansion using external collections. SIGIR ’17, page 1045–1048, New York, NY, USA, 2017. Association for Computing Machinery.
  229. Using various term dependencies according to their utilities. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM ’10, page 1493–1496, New York, NY, USA, 2010. Association for Computing Machinery.
  230. Named entity discovery using comparable news articles. In COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, pages 848–853, Geneva, Switzerland, aug 23–aug 27 2004. COLING.
  231. Neural cross-lingual entity linking. In AAAI, 2018.
  232. A general language model for information retrieval. In Proceedings of the Eighth International Conference on Information and Knowledge Management, CIKM ’99, page 316–321, New York, NY, USA, 1999. Association for Computing Machinery.
  233. Exploiting wikipedia for cross-lingual and multilingual information retrieval. Data & Knowledge Engineering, 74:26–45, 2012. Applications of Natural Language to Information Systems.
  234. Incorporating query term dependencies in language models for document retrieval. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 2003.
  235. Fast and accurate entity recognition with iterated dilated convolutions. In EMNLP, 2017.
  236. Yago: A core of semantic knowledge unifying wordnet and wikipedia. 2007.
  237. A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms. In Discovery Science, 2006.
  238. Distilling knowledge for fast retrieval-based chat-bots. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  239. Use of support vector machines in ¡i¿extended¡/i¿ named entity recognition. In Proceedings of the 6th Conference on Natural Language Learning - Volume 20, COLING-02, page 1–7, USA, 2002. Association for Computational Linguistics.
  240. Offline versus online representation learning of documents using external knowledge. ACM Trans. Inf. Syst., 37(4), sep 2019.
  241. Neural document expansion for ad-hoc information retrieval. CoRR, abs/2012.14005, 2020.
  242. Language model information retrieval with document expansion. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, HLT-NAACL ’06, page 407–414, USA, 2006. Association for Computational Linguistics.
  243. Semi-supervised bootstrapping approach for named entity recognition. ArXiv, abs/1511.06833, 2015.
  244. Named entity recognition: Exploring features. In KONVENS, 2012.
  245. Named entity recognition with stack residual LSTM and trainable bias decoding. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 566–575, Taipei, Taiwan, November 2017. Asian Federation of Natural Language Processing.
  246. Cross-lingual wikification using multilingual embeddings. In NAACL, 2016.
  247. Joint multilingual supervision for cross-lingual entity linking. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2486–2495, Brussels, Belgium, October-November 2018. Association for Computational Linguistics.
  248. 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association : JAMIA, 18:552–6, 06 2011.
  249. Ellen M. Voorhees. Query expansion using lexical-semantic relations. In SIGIR ’94, 1994.
  250. Wikidata: A free collaborative knowledgebase. Commun. ACM, 57(10):78–85, sep 2014.
  251. Comparison of named entity recognition tools applied to news articles. 2019 Ivannikov Ispras Open Conference (ISPRAS), pages 72–77, 2019.
  252. Named entity recognition with gated convolutional neural networks. pages 110–121, 10 2017.
  253. Recognizing unregistered names for mandarin word identification. In Proceedings of the 14th Conference on Computational Linguistics - Volume 4, COLING ’92, page 1239–1243, USA, 1992. Association for Computational Linguistics.
  254. A cascade ranking model for efficient ranked retrieval. Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval, 2011.
  255. Regularized latent semantic indexing. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’11, page 685–694, New York, NY, USA, 2011. Association for Computing Machinery.
  256. Cold: Towards the next generation of pre-ranking system, 2020.
  257. Managing word mismatch problems in information retrieval: A topic-based query expansion approach. Journal of Management Information Systems, 24(3):269–295, 2007.
  258. Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks. Database: The Journal of Biological Databases and Curation, 2016, 2016.
  259. Lda-based document models for ad-hoc retrieval. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’06, page 178–185, New York, NY, USA, 2006. Association for Computing Machinery.
  260. Zero-shot entity linking with dense entity retrieval. CoRR, abs/1911.03814, 2019.
  261. Probase: a probabilistic taxonomy for text understanding. Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, 2012.
  262. Esdrank: Connecting query and documents through external semi-structured data. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, page 951–960, New York, NY, USA, 2015. Association for Computing Machinery.
  263. Query expansion with freebase. In Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR ’15, page 111–120, New York, NY, USA, 2015. Association for Computing Machinery.
  264. Bag-of-entities representation for ranking. In Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, ICTIR ’16, page 181–184, New York, NY, USA, 2016. Association for Computing Machinery.
  265. Word-entity duet representations for document ranking. SIGIR ’17, page 763–772, New York, NY, USA, 2017. Association for Computing Machinery.
  266. Approximate nearest neighbor negative contrastive learning for dense text retrieval. CoRR, abs/2007.00808, 2020.
  267. Query expansion using local and global document analysis. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’96, page 4–11, New York, NY, USA, 1996. Association for Computing Machinery.
  268. Relevance ranking using kernels. In Pu-Jen Cheng, Min-Yen Kan, Wai Lam, and Preslav Nakov, editors, Information Retrieval Technology - 6th Asia Information Retrieval Societies Conference, AIRS 2010, Taipei, Taiwan, December 1-3, 2010. Proceedings, volume 6458 of Lecture Notes in Computer Science, pages 1–12. Springer, 2010.
  269. Query dependent pseudo-relevance feedback based on wikipedia. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’09, page 59–66, New York, NY, USA, 2009. Association for Computing Machinery.
  270. A survey on recent advances in named entity recognition from deep learning models. CoRR, abs/1910.11470, 2019.
  271. A unified pretraining framework for passage ranking and expansion. In AAAI, 2021.
  272. Neural reranking for named entity recognition, 2017.
  273. Anserini: Enabling the use of lucene for information retrieval research. Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017.
  274. Pretrained transformers for text ranking: Bert and beyond. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 2666–2668, New York, NY, USA, 2021. Association for Computing Machinery.
  275. A comparative study of utilizing topic models for information retrieval. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, page 29–41, Berlin, Heidelberg, 2009. Springer-Verlag.
  276. Few-shot conversational dense retrieval. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.
  277. Wajdi Zaghouani. Renar: A rule-based arabic named entity recognition system. ACM Transactions on Asian Language Information Processing, 11(1), mar 2012.
  278. From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing. Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018.
  279. Bm25 pseudo relevance feedback using anserini at waseda university. In OSIRRC@SIGIR, 2019.
  280. K. Zeroual and P.-N. Robillard. Kbms: a knowledge-based system for modeling software system specifications. IEEE Transactions on Knowledge and Data Engineering, 4(3):238–252, 1992.
  281. Model-based feedback in the language modeling approach to information retrieval. In Proceedings of the Tenth International Conference on Information and Knowledge Management, CIKM ’01, page 403–410, New York, NY, USA, 2001. Association for Computing Machinery.
  282. Neural models for sequence chunking, 2017.
  283. Optimizing dense retrieval model training with hard negatives. Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021.
  284. Learning discrete representations via constrained clustering for effective and efficient dense retrieval. CoRR, abs/2110.05789, 2021.
  285. Optimizing dense retrieval model training with hard negatives. CoRR, abs/2104.08051, 2021.
  286. Repbert: Contextualized text embeddings for first-stage retrieval. ArXiv, abs/2006.15498, 2020.
  287. Medical named entity recognition based on dilated convolutional neural network. Cognitive Robotics, 2:13–20, 2022.
  288. Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts. Journal of biomedical informatics, 46 6:1088–98, 2013.
  289. Neural information retrieval: A literature review. ArXiv, abs/1611.06792, 2016.
  290. Learning to rank in the age of muppets: Effectiveness–efficiency tradeoffs in multi-stage ranking. In Proceedings of the Second Workshop on Simple and Efficient Natural Language Processing, pages 64–73, Virtual, November 2021. Association for Computational Linguistics.
  291. Dc-bert: Decoupling question and document for efficient contextual encoding. Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020.
  292. Bigcnn: Bidirectional gated convolutional neural network for chinese named entity recognition. In DASFAA (1), pages 502–518, 2020.
  293. Learning to reweight terms with distributed representations. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.
  294. Joint extraction of entities and relations based on a novel tagging scheme, 2017.
  295. Named entity recognition using an hmm-based chunk tagger. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL ’02, page 473–480, USA, 2002. Association for Computational Linguistics.
  296. Joint extraction of multiple relations and entities by using a hybrid neural network. In CCL, 2017.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Dahlia Shehata (4 papers)
X Twitter Logo Streamline Icon: https://streamlinehq.com

HackerNews