Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Transformer Based Language Models for Similar Text Retrieval and Ranking (2005.04588v2)

Published 10 May 2020 in cs.IR and cs.CL

Abstract: Most approaches for similar text retrieval and ranking with long natural language queries rely at some level on queries and responses having words in common with each other. Recent applications of transformer-based neural LLMs to text retrieval and ranking problems have been very promising, but still involve a two-step process in which result candidates are first obtained through bag-of-words-based approaches, and then reranked by a neural transformer. In this paper, we introduce novel approaches for effectively applying neural transformer models to similar text retrieval and ranking without an initial bag-of-words-based step. By eliminating the bag-of-words-based step, our approach is able to accurately retrieve and rank results even when they have no non-stopwords in common with the query. We accomplish this by using bidirectional encoder representations from transformers (BERT) to create vectorized representations of sentence-length texts, along with a vector nearest neighbor search index. We demonstrate both supervised and unsupervised means of using BERT to accomplish this task.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Javed Qadrud-Din (2 papers)
  2. Ashraf Bah Rabiou (1 paper)
  3. Ryan Walker (3 papers)
  4. Ravi Soni (7 papers)
  5. Martin Gajek (5 papers)
  6. Gabriel Pack (1 paper)
  7. Akhil Rangaraj (1 paper)
Citations (4)

Summary

We haven't generated a summary for this paper yet.