Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 56 tok/s

Gemini 2.5 Pro 39 tok/s Pro

GPT-5 Medium 15 tok/s Pro

GPT-5 High 16 tok/s Pro

GPT-4o 99 tok/s Pro

Kimi K2 155 tok/s Pro

GPT OSS 120B 476 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

A Survey of Model Architectures in Information Retrieval (2502.14822v1)

Published 20 Feb 2025 in cs.IR

Abstract: This survey examines the evolution of model architectures in information retrieval (IR), focusing on two key aspects: backbone models for feature extraction and end-to-end system architectures for relevance estimation. The review intentionally separates architectural considerations from training methodologies to provide a focused analysis of structural innovations in IR systems.We trace the development from traditional term-based methods to modern neural approaches, particularly highlighting the impact of transformer-based models and subsequent LLMs. We conclude by discussing emerging challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to novel application domains beyond traditional search paradigms.

Collections

Summary

The paper surveys information retrieval model architectures, tracing their evolution from traditional methods to modern neural approaches, focusing on architectural design rather than training methodologies.
It details various model types including Boolean, vector space, probabilistic, language models, LTR, neural ranking models (DSSM, DRMM, MatchPyramid), and transformer-based models like BERT, ColBERT, and applications of LLMs.
The survey identifies future directions and challenges in IR model architectures, including performance optimization, scalability, handling multimodal and multilingual data, and adaptation for autonomous search agents.

This paper presents a survey of model architectures in Information Retrieval (IR), emphasizing backbone models for feature extraction and end-to-end system architectures for relevance estimation. The survey focuses on architectural considerations, intentionally separating them from training methodologies.

The paper traces the evolution of IR systems from traditional term-based methods, such as Boolean and vector space models, to neural approaches, focusing on transformer-based models and LLMs. It concludes with a discussion of challenges and future directions, including architectural optimizations for performance and scalability, handling of multimodal, multilingual data, and adaptation to new application domains.

The paper begins by defining the ad hoc retrieval task, where given a query $\mathcal{Q}$ , the objective is to find a ranked list of $k$ documents, denoted as $\lbrace \mathcal{D}_1, \mathcal{D}_2, \ldots, \mathcal{D}_k \rbrace$ , that exhibit the highest relevance to $\mathcal{Q}$ . Performance is measured using standard IR metrics like Mean Reciprocal Rank, Recall, and normalized Discounted Cumulative Gain (nDCG).

Traditional IR models are examined, including:

Boolean Model: Documents $\mathcal{D}$ are represented as a set of terms $\{t_1, t_2, \dots, t_n\}$ , and relevance is determined by a logical implication $\mathcal{D} \rightarrow \mathcal{Q}$ .
Vector Space Model: Queries and documents are represented as vectors, e.g., $\mathcal{Q}=<q_1, q_2, \dots, q_n>$ and $\mathcal{D} = <d_1, d_2, \dots, d_n>$ . The relevance score is estimated by a similarity function between the query $\mathcal{Q}$ and the document $\mathcal{D}$ .
Probabilistic Model: The relevance score depends on a set of events $\{x_i\}_1^{n}$ representing the occurrence of term $t_i$ in the document. The simplest model is the binary independence retrieval model, where the relevance score is: $\text{Score}(\mathcal{Q},\mathcal{D}) \propto \sum_{(x_i=1)\in\mathcal{D}} \log \frac{r_i(T-n_i-R+r_i)}{(R-r_i)(n_i-r_i)}$ , where:

$T$ = total number of sampled judged documents $R$ = number of relevant samples

$n_i$ = number of samples containing $t_i$

$r_i$ = number of relevant samples containing $t_i$

Statistical LLM: The relevance score is estimated via $\mathcal{P}(\mathcal{D}|\mathcal{Q})$ , derived as directly proportional to $\mathcal{P}(\mathcal{Q}|\mathcal{D})\mathcal{P}(\mathcal{D})$ based on Bayes Rule. The main focus is on modeling $\mathcal{P}(\mathcal{Q}|\mathcal{D})$ as a ranking function by treating the query as a set of independent terms as $\mathcal{Q}=\{t_i\}_{i=1}^n$ , thus $\mathcal{P}(\mathcal{Q}|\mathcal{D})=\prod_{t_i \in \mathcal{Q}\mathcal{P}(t_i|\mathcal{D}).$ The probability $\mathcal{P}(t_i|\mathcal{D})$ is determined using a statistical LLM $\theta_{D}$ that represents the document, then the relevance is estimated by log-likelihood as

$\text{Score}(\mathcal{Q},\mathcal{D}) = \log\mathcal{P}(\mathcal{Q}|\theta_{D}) = \sum_{t_i \in \mathcal{Q}\log\mathcal{P}(t_i|\theta_{D}),$

Learning-to-Rank (LTR) models utilize supervised ML on numerical features. For each $(\mathcal{Q}_i, \mathcal{D}_i)$ pair, a $k$ -dimensional feature vector $\mathbf{x}_i \in \mathbb{R}^{k}$ and a relevance label $\mathbf{y}_i$ is provided to the ranking model $f$ . The ranking is trained to minimize the empirical loss on labeled training set $\Psi$ : $\mathcal{L} = 1 / |\Psi| \sum_{(\mathbf{x}_i, \mathbf{y}_i) \in \Psi} l(f_{\theta}(\mathbf{x}_i), \mathbf{y_i})$ . LTR models include ML-based models such as RankSVM and LambdaMART (based on Gradient Boosted Decision Trees (GBDT)), as well as neural LTR models like RankNet and LambdaRank.

Neural ranking models use deep neural networks to learn feature representations directly from raw text. Depending on how queries interact with documents, these models are divided into representation-based models and interaction-based models. Representation-based models, such as the Deep Structured Semantic Model (DSSM), independently encode queries and documents into a latent vector space. Interaction-based models process queries and documents jointly through neural networks. MatchPyramid employs CNNs over the interaction matrix between query and document terms, while the Deep Relevance Matching Model (DRMM) constructs matching histograms for each query term.

The paper discusses IR architectures based on pre-trained transformers, with a focus on BERT-type encoder models. BERT's success is attributed to multi-head attention and large-scale pre-training. The paper covers text reranking, learned dense retrieval, learned sparse retrieval (LSR), and multi-vector representations. For text reranking, models like monoBERT concatenate $(\mathcal{Q}, \mathcal{D})$ as input and output a relevance score. Learned dense retrieval uses bi-encoders to encode queries and documents separately, computing relevance with similarity functions. Learned sparse retrieval also uses a bi-encoder architecture to transform documents into a sparse vector for faster retrieval. Multi-vector representations, exemplified by ColBERT, represent each token in the query and document as a contextualized vector, enhancing interaction.

The paper also discusses the use of LLMs for IR tasks. LLMs have exhibited proficiency in language understanding and generation and can be used for feature extraction and relevance estimation. Adopting an LLM as the backbone for a bi-encoder retrieval model has improved performance compared to smaller models like BERT. LLMs can also be fine-tuned as cross-encoder rerankers or used as unsupervised rerankers through prompting techniques. Generative retrieval, which bypasses the indexing step by using autoregressive LLMs to directly generate document identifiers (DocIDs), is also discussed.

Finally, the survey identifies emerging directions and challenges in IR, including the need for better models for feature extraction, flexible relevance estimators, and addressing open questions related to the end "user" of retrieval and autonomous search agents. Key areas for model improvement include parallelizable and low precision training, inference optimization, data efficiency, multimodality and multilinguality, and transformer alternatives like linear RNNs and state space models.