Contextual Relevance & Adaptive Algorithms

Updated 10 November 2025

Contextual relevance is defined as a dynamic, context-dependent measure that evaluates how well a document or fact meets specific needs based on surrounding factors.
Adaptive algorithms like TS-SetRank balance exploration and exploitation to efficiently estimate target relevance under limited inference resources.
Empirical results highlight that modeling candidate composition and order significantly improves ranking accuracy and retrieval performance.

Contextual relevance refers to a dynamic, context-dependent measure of how well an information item—such as a document, fact, or candidate entity—addresses a specific information need or task given the configuration of surrounding items, user background, or environmental variables. This concept subsumes and extends traditional, context-free notions of relevance by recognizing that observed relevance judgments are often stochastic functions of the broader context: i.e., the batch, set, or order in which candidates are presented, the composition of accompanying distractors, and other extrinsic factors such as user profile, system/device, spatiotemporal state, or pragmatic intent. Rigorous modeling of contextual relevance is increasingly crucial across domains such as document retrieval, recommendation, knowledge graph reasoning, human-computer interaction, and cognitive modeling.

1. Probabilistic Formalization and Inference

Contextual relevance in reranking systems is formalized as the marginal probability that a candidate is judged relevant, averaged over the ensemble of possible presentation contexts. For a query $q$ and candidate $d_i$ drawn from a candidate pool $D$ , contextual relevance $\theta_{i, q}$ is defined as

$\theta_{i, q} = \mathbb{E}_{S\sim D_b(D)\,:\,d_i\in S}\left[\Pr(d_i \text{ is judged relevant} \mid q, S)\right]$

where $S$ is a batch (ordered subset) of size $b$ , sampled from all possible batches in which $d_i$ could appear. This expectation can be further generalized as

$P(d_i\text{ relevant}\mid q) = \mathbb{E}_{C\sim D(q)}[P(d_i\text{ relevant}\mid q, C)]$

with context $C$ ranging over both co-occurring candidate subsets and their orderings. This framework contrasts with earlier IR approaches, which treat document relevance as a fixed attribute, and demonstrates that relevance labels from LLM-based or human annotators are not stable with respect to context composition and order.

To efficiently approximate contextual relevance under resource constraints, the TS-SetRank algorithm models reranking as a combinatorial Bayesian semi-bandit problem. Each candidate $d_i$ is assigned a Beta prior $\mathrm{Beta}(\alpha_i,\beta_i)$ , and an adaptive sampling schedule alternates between uniform exploration and Thompson sampling–driven exploitation to focus queries where uncertainty about $\theta_{i, q}$ remains high.

2. Contextual Factors: Composition, Order, and Higher-Order Interactions

Empirical analyses reveal that both the composition (set membership) and order in which candidates are presented have significant effects on LLM-derived relevance judgments. Variance decompositions in setwise reranking show that:

Positional effects (order within a batch) account for 16–36% of between-batch variance in judged relevance for $b=10$ .
Compositional effects (which distractors accompany a candidate) explain an additional ~9% of relevance variance.

Thus, accurate contextual relevance estimation requires marginalization over these sources of uncertainty; ignoring either can yield unstable or biased rankings. This is particularly acute in reasoning-intensive retrieval benchmarks, where subtle distractor choices may either highlight or obfuscate the target’s relevance signal.

This insight complements findings from contextual bandit literature, where relevance may be a function of a low-dimensional (but unknown) subset of context features (Tekin et al., 2015), and expands upon context-aware object detection models that explicitly marginalize over a detection's "few relevant neighbors" to produce context-calibrated probabilities (Barnea et al., 2017).

3. Adaptive Algorithms: TS-SetRank and Bandit Framing

TS-SetRank provides an efficient sampling-based framework for estimating contextual relevance under fixed LLM-inference budgets. Briefly:

Initialization: Each $d_i$ is associated with $\alpha_i = \beta_i = 1$ .
Exploration Phase ( $t \le T_f$ ): Uniformly sample batches $S_t$ .
Exploitation Phase ( $t > T_f$ ): For each candidate, draw $\tilde{\theta}_i \sim \mathrm{Beta}(\alpha_i, \beta_i)$ ; select the top- $b$ candidates for reranking.
Feedback and Posterior Update: For batch $S_t$ , obtain LLM judgments and update Beta parameters according to observed relevance labels.
Output: Rank candidates by posterior mean $\hat{\theta}_i$ .

The two-phase policy leverages Thompson sampling to balance exploration and exploitation, achieving sublinear surrogate regret:

$\mathbb{E}[\mathrm{Regret}(T)] = \tilde{O}(\sqrt{b N T})$

This ensures that, over time, the estimator's ranking converges to that minimizing expected information loss under the true contextual distribution. Variants with different exploration/exploitation splits (e.g., TS-25/75) can be tuned to inference budget constraints.

4. Empirical Validation and Quantification of Context Effects

Experimental results on BRIGHT (reasoning-intensive) and BEIR (heterogeneous zero-shot) benchmarks using Qwen2.5-7B as the setwise reranker show:

BRIGHT: TS-SetRank achieves nDCG@10 of 0.294 vs. 0.235 (BM25) and 0.256 (Heapify), a 15–25% relative improvement.
BEIR: TS-SetRank attains 0.429 (nDCG@10) vs. 0.357 (BM25) and 0.408 (Heapify), a 6–21% gain.
Budget sensitivity: Under halved LLM budgets ( $T=50$ ), adaptive TS-SetRank variants outperform uniform reranking by up to 2.4 points in nDCG@10.
Variance ablation: For $b=10$ , positional context accounts for ∼36% of total judgment variance, composition for ∼9%.

These results underscore that setwise and adaptive modeling of context leads to both higher accuracy and greater stability, particularly where direct LLM evaluation is expensive and context effects are pronounced.

5. Theoretical Guarantees and Necessity of Context Marginalization

TS-SetRank's bandit-based design inherits theoretical properties from stochastic combinatorial semi-bandit literature:

Surrogate regret (with “number of relevant docs found”) grows sublinearly with calls.
Posterior $\hat{\theta}_i$ asymptotically converges to the true contextual relevance $\theta_{i, q}$ by the Law of Large Numbers, given sufficient sampling.
Uniform sampling ( $T_f = T$ ) is insufficient for closing the regret gap—adaptive exploitation based on accumulated feedback is required.

Notably, the necessity of modeling context as a stochastic latent variable (as opposed to a fixed covariate) is supported by both variance analyses and observed bias in static (context-free) reranking protocols.

6. Cross-Domain Extensions and Comparative Frameworks

The general principle that relevance is context-dependent—whether in document reranking, recommendation, or entity retrieval—emerges in multiple domains:

In contextual bandits, "relevance" is the dependence of an action's reward on low-dimensional context subspaces, and algorithms such as RELEAF explicitly learn which coordinates matter per action (Tekin et al., 2015).
For visual perception, contextual semantic relevance metrics (integrating vision- and language-model features) most robustly predict human fixations when contextual relationships are explicitly modeled (Sun et al., 2024).
In named-entity retrieval, user-driven term feedback is required to surface the contextual facets that define entity similarity (Sarwar et al., 2018).
In e-commerce, k-order contextual relevance over bipartite graphs outperforms isolated pairwise modeling (Liu et al., 2022).

This suggests that context-marginalized relevance forms a unifying abstraction applicable in both supervised learning systems and bandit/reinforcement settings.

7. Open Questions and Future Directions

Scalability: Efficiently sampling or approximating the context space is computationally challenging as context set/batch size increases. Approximate methods (e.g., importance weighting, context subsampling) and structure learning may be required for large-scale deployments.
Context selection and design: Determining which dimensions or aspects of context most affect relevance remains an open research direction, especially under resource constraints or in partially observed environments.
Multi-criteria extensions: Pure topical/contextual relevance may be insufficient for practical systems (e.g., RAG pipelines), where multiple quality dimensions (depth, diversity, authority) must also be optimized. Explicit multi-criteria reranking extends the contextual framework (LeVine et al., 14 Mar 2025).
User adaptation and personalization: Incorporating user feedback and preference modeling can further adapt relevance estimation in context-rich settings.
Theoretical characterization: The generalization of the sublinear regret framework to arbitrary context-dependent feedback and more expressive priors remains an active theoretical challenge.

Overall, contextual relevance reframes information retrieval, selection, and decision-making tasks as context-marginalized, adaptive estimation and optimization problems—a shift with broad algorithmic and empirical implications for LLM-backed systems and beyond.