Lexical & Neural Relevance Signals in IR

Updated 25 October 2025

Lexical and neural relevance signals are complementary; lexical features provide explicit term matching while neural models capture deeper semantic relationships.
Hybrid models like CLEAR and REGENT integrate these signals to enhance retrieval performance by balancing precise term matches and contextual understanding.
Research shows that fusing these signals improves robustness, scalability, and interpretability in document ranking, while also addressing inherent biases.

Lexical and neural relevance signals are fundamental concepts in information retrieval (IR) and NLP. Lexical relevance signals refer to statistical or structural indicators of relevance derived from the surface forms of words (e.g., term overlap, frequency, and explicit query–document matches), while neural relevance signals denote more abstract, distributed representations and their learned interactions, as captured by neural network models. Recent research has shown that combining these signals, or designing models in which neural representations are guided by or fused with lexical indicators, yields superior performance for a range of tasks from document retrieval and query expansion to multimodal segmentation and semantic modeling.

1. Foundations and Definitions

Lexical relevance signals originate from direct textual features that indicate relevance. Classical IR systems, such as those based on BM25, assign scores using explicit token overlap and frequency-based models, with weights such as the Robertson–Sparck Jones (RSJ) weight quantifying term importance. Mathematically, lexical relevance for a query $q$ and document $d$ can be modeled as:

$S_{\text{lex}}(q, d) = \sum_{t \in q \cap d} r_t \cdot \frac{tf_{t,d}}{tf_{t,d} + k_1[(1-b) + b \cdot (|d|/\ell)]}$

where $r_t$ is the RSJ weight for term $t$ , and $tf_{t,d}$ is its frequency in $d$ (Kong et al., 2023, Zamani et al., 2017). These signals are explicit, interpretable, and provide strong precision, especially in scenarios reliant on surface-form matches.

Neural relevance signals arise from learned, distributed representations (e.g., embeddings, neural similarity metrics, attention mechanisms) parameterized through deep learning. Models such as BERT-based Siamese encoders, residual embedding learners, and attention-modulated transformers represent documents and queries in high-dimensional vector spaces and estimate relevance as a function (often a similarity or alignment) of these representations:

$S_{\text{neural}}(q, d) = \text{sim}(f_{\theta}(q), f_{\theta}(d))$

where $f_{\theta}$ is a neural encoder, and $\text{sim}$ is often a dot product or cosine similarity (Gao et al., 2020, Kong et al., 2023).

Hybrid models, such as CLEAR and REGENT, assert that these signals are complementary: lexical overlap captures precise term correspondence, while neural models infer deeper semantic relationships, synonymy, and contextually modulated relevance (Gao et al., 2020, Chatterjee, 13 Oct 2025).

2. Model Architectures and Training Objectives

Research divides models by how and where lexical and neural signals interact:

Lexical-focused models: Classic approaches, e.g., BM25 and query likelihood, only use surface-form features for ranking.
Neural-only models: Learning embeddings via proximity objectives (e.g., word2vec) or semantic similarities (e.g., SBM, BERT), these models excel in capturing latent associations but can lose sensitivity to explicit term matches, especially in out-of-domain scenarios (Formal et al., 2021).
Hybrid and relevance-guided models:
- CLEAR (Gao et al., 2020) employs a residual learning framework, where a neural embedding model is explicitly trained to encode the residuals—semantic relevance not captured by the lexical baseline. The loss is conditioned on lexical performance to ensure the neural model only intervenes on challenging discrepancies.
- REGENT (Chatterjee, 13 Oct 2025) integrates BM25-based token-level scores directly into the self-attention mechanism of a transformer, while augmenting the architecture with a parallel semantic entity pathway. An adaptive fusion then combines these pathways for final scoring.
- LexBoost (Kulkarni et al., 25 Aug 2024) constructs a corpus graph offline using dense embeddings to identify nearest neighbors. At retrieval, each document’s score is adjusted by aggregating the scores of its semantic neighbors, thereby infusing neural proximity into a lexical ranking.

Several works formulate the fusion of signals as convex combinations, e.g.,

$\text{LexBoost}(q, d) = \lambda \cdot \text{score}(q, d) + \frac{1-\lambda}{n} \sum_{\text{neighbors } d'} \text{score}(q, d')$

(Kulkarni et al., 25 Aug 2024).

3. Relevance Signal Integration in Applications

The integration of lexical and neural signals has been systematically explored across diverse tasks:

Information Retrieval and Ranking: CLEAR and similar fusions increase recall and precision in document retrieval, with notable improvements in mean reciprocal rank (MRR), normalized discounted cumulative gain (NDCG), and pipeline efficiency (Gao et al., 2020, Cheng et al., 2022).
Query Expansion and Classification: Relevance-based embeddings (RLM/RPE) directly model word probabilities conditioned on query-relevance distributions, outperforming proximity-driven embeddings for expansion and classification on TREC and KDD datasets (Zamani et al., 2017).
Code and Multimodal Retrieval: CSRS fuses n-gram-level lexical interactions (via CNNs) with semantic co-attention mechanisms for code search, while similar multimodal speaker diarization systems combine lexical (word embedding) and neural (MFCC acoustic) cues through sequence-to-sequence architectures (Cheng et al., 2022, Park et al., 2018).
Personalization: Neural contextual ranking frameworks integrate both BM25 (lexical) and SentenceBERT (semantic) signals, plus explicit user context attributes, within a deep cross network for personalized search (Kong et al., 2023).

Ablation studies consistently confirm that removing either the lexical or neural component leads to marked drops in effectiveness, validating the complementary nature of these signals (Cheng et al., 2022, Kulkarni et al., 25 Aug 2024).

4. Interpretability and Diagnostic Frameworks

Recent work investigates how well neural and hybrid models retain lexical matching, especially for rare or out-of-training terms:

Lexical Matching Discrepancy: Using RSJ weights to compare user-ideal term importance with model-driven weights, it is shown that dense neural models (e.g., TAS-B, bi-encoders) often underestimate crucial terms, especially out-of-domain, whereas lexical models (BM25) or interaction-based models (ColBERT, SPLADE) are more robust (Formal et al., 2021).
Global Explanations: Construction of a “relevance thesaurus” allows for global, query-independent explanations of neural re-rankers. This thesaurus pairs query and document terms with their neural-relevance scores (from models such as PaRM), which can then be used to augment lexical retrieval (BM25T). This provides both interpretability and performance gains and reveals systematic neural model biases (e.g., over-emphasis on particular brand names) (Kim et al., 4 Oct 2024).
Interpretability in Speech: Layerwise Relevance Propagation (LRP) is used to expose which spectro-temporal features inform CNN stress classification, revealing a primary focus on the stressed syllable’s first and second formants, as well as the distributed nature of stress cues across the word (Allouche et al., 10 Aug 2025).

5. Extensions: Multimodal and Cognitive Signals

Lexical and neural relevance signals extend to multimodal and cognitive domains:

Speech and Diarization: Sequence-to-sequence models jointly process word embeddings (lexical) and acoustic features (MFCCs, neural) to segment and diarize speakers, with custom loss functions handling speaker label permutation (Park et al., 2018).
EEG-Based Feedback: Real-time brain signals (from EEG) provide neural relevance scores that, fused with lexical and click-based signals, substantially improve iterative and retrospective document re-ranking, particularly in “non-click” or ambiguous user feedback scenarios (Ye et al., 2023).
Cognitive Modeling: Dynamic neural field frameworks represent lexical meaning as metastable neural states, with lexical signals (stable long-term coupling patterns) and neural relevance signals (contextual, real-time neural activations) jointly shaping polysemous word interpretation and capturing individual differences in context sensitivity (Stern et al., 19 Jul 2024).

6. Implications, Challenges, and Future Directions

Empirical evidence demonstrates that incorporating both lexical and neural signals delivers superior IR and NLP performance:

Robustness: Hybrid systems mitigate the weaknesses of each component (e.g., lexical matching’s inability to handle synonymy; neural models’ struggles with term rarity and precision) and exhibit robustness to domain shift (Formal et al., 2021, Gao et al., 2020, Kulkarni et al., 25 Aug 2024).
Bias and Fairness: Global analyses (e.g., via the relevance thesaurus) expose biases in neural ranking (e.g., over-weighting certain brands or temporal periods), providing a foundation for diagnostic and mitigation strategies (Kim et al., 4 Oct 2024).
Scalability: Methods such as LexBoost demonstrate that offline computation of neural proximity can be leveraged for efficient online lexical re-ranking, making hybrid relevance feasible at web scale (Kulkarni et al., 25 Aug 2024).
Interpretability: Attention-guided mechanisms and interpretable hybrid models (e.g., REGENT, BM25T) offer greater transparency while preserving or improving retrieval accuracy (Chatterjee, 13 Oct 2025, Kim et al., 4 Oct 2024).
Cognitive Validity: The congruence between information-restricted neural models’ predictions and brain imaging data highlights the cognitive plausibility of certain architectures for capturing both lexical and semantic processing in the brain (Pasquiou et al., 2023).

Promising research directions include unified metrics for balancing signals, adaptive fusion strategies, further integration across modalities (e.g., acoustic, visual), exploitation of cognitive signals in interactive retrieval systems, and methods for bias identification and control.

7. Summary Table: Representative Model Families

Model/Framework	Lexical Signal Source	Neural Signal Source	Fusion/Integration Strategy	Key Application
BM25, QL, RSJ	Token overlap, statistics	None	None	Classical IR, baseline retrieval
CLEAR (Gao et al., 2020)	BM25 scoring	Siamese BERT; semantic embedding	Residual learning, weighted score fusion	First-stage retrieval, re-ranking
REGENT (Chatterjee, 13 Oct 2025)	BM25 token-level scores	Entity-attention, context BERT	Dual-path attention, adaptive fusion	Re-ranking, entity-aware IR
LexBoost (Kulkarni et al., 25 Aug 2024)	BM25	Neighbor graph from dense vectors	Off-line neighbor graph, online score mix	Efficient high-recall retrieval
Relevance Thesaurus (Kim et al., 4 Oct 2024)	BM25	BERT cross-encoder via PaRM	Thesaurus expansion (BM25T), global expl.	Model explanation, bias detection
RLM/RPE (Zamani et al., 2017)	Pseudo-relevance feedback	Dense embedding via softmax/sigmoid	Task-driven learning objective	Embedding for expansion, classification

These developments collectively support a paradigm shift in IR, where precision lexical signals and flexible neural representations are designed to work in concert.