Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 26 tok/s Pro

GPT-4o 74 tok/s Pro

Kimi K2 188 tok/s Pro

GPT OSS 120B 362 tok/s Pro

Claude Sonnet 4.5 34 tok/s Pro

2000 character limit reached

Dynamic Positional Bias in Sequential Models

Updated 12 August 2025

Dynamic Positional Bias (DPB) is a context-dependent variation in how models utilize positional information in sequential tasks.
It adjusts positional weights based on input content, query intent, and user state, enabling adaptive modulation of attention and ranking.
Key strategies include learned position embedding modulation, doubly-robust estimators, and scaling of hidden states to improve prediction accuracy.

Dynamic Positional Bias (DPB) refers to the context-dependent and often task- or instance-specific variation in how models utilize positional information embedded in data sequences. Unlike static positional bias—where models have a fixed, often monotonic, weighting for certain positions (such as a preference for early or late elements)—DPB manifests when position-related effects vary with input content, query intent, user state, or other dynamic histories. DPB is critical in domains involving sequential data—search/retrieval, click modeling, ranking, natural LLMs, vision transformers, and robotics—because it modulates model attention and prediction quality based on positional cues that evolve with context, input distribution, or user intent.

1. Foundational Models of Positional Bias and Extension to DPB

Traditional models such as the Examination Hypothesis (EH) posit that the probability of a click is the product of a fixed position bias and document relevance: $c_q(d,j) = g_q(d) \cdot p(j)$ . These models capture a static notion of position bias, assuming uniformity across queries.

The introduction of query-specific position bias, as formalized in $c_q(d,j) = g_q(d) \cdot p_q(j)$ , constitutes a core operationalization of DPB (Gollapudi et al., 2010). Here, $p_q(j)$ varies with both query class and user intent, directly reflecting a dynamic, context-aware bias. This extension fundamentally allows search systems to “unmix” relevance from positional effects, and produces position-independent relevance scores. The estimation procedure leverages a linearized system via logarithmic transformation and matrix least-squares, accommodating document-position bipartite graphs for each query and applying regularization for disconnected components.

The adoption of DPB in learning-to-rank further benefits from doubly-robust estimation frameworks, which combine direct regression with inverse-propensity scoring, adapting robustly to changing examination probabilities and trust effects ( $\alpha_k, \beta_k$ ) that may themselves shift over time or context (Oosterhuis, 2022). By modeling expected treatment per rank ( $\rho_d = \mathbb{E}_{y \sim \pi_0} [\alpha_{k(d)}]$ ), such methods afford dynamic adaptation to evolving user attention and ranking presentation.

2. DPB in Deep Architectures: Interaction, Attention, and Modulation

Deep models increasingly capture DPB not through explicit propensity parameters but via learned, non-linear position-aware interactions. The Deep Position-wise Interaction Network (DPIN) (Huang et al., 2021) exemplifies this advanced treatment, modeling click-through-rate (CTR) as a non-linear function of joint item, user, context, and position representations:

$\text{CTR}_k^j = \sigma(\text{ReLU}([\mathbf{r}_\text{item}^j, \mathbf{r}_\text{pos}^k, E(k)] W_1 + b_1) W_2 + b_2)$

Such architectures support rich, dynamic interplay among position-dependent and contextual factors, moving beyond simple multiplicative bias; they combine information at each candidate-position pair. Evaluation uses PAUC (Position-wise AUC), which assesses ranking fidelity per display position, isolating DPB in model predictions.

In vision, DPB is formally measured via Position-SHAP (Bruintjes et al., 19 May 2025), extending feature attribution to position embeddings; the Position-SHAP score quantifies the fraction of a classifer’s decision attributable to positional tokens. For further adaptation, Auto-PE introduces a learnable scalar norm $\gamma$ that modulates position embedding strength per dataset, allowing end-to-end optimization for translation invariance or maximal capture bias as needed:

$\mathbf{Y}_i = \text{Linear}(\mathbf{X}_i^P) + \gamma \cdot \mathbf{P}_i$

When $\gamma \to 0$ , position information is de-emphasized; when large, position dominates inference.

3. Empirical Manifestations of DPB in Sequence Models and Embeddings

DPB is observed in both micro-level attention distributions and macro-level prediction outcomes. Transformer models for text retrieval, question-answering, and long-context tasks display fluctuating performance with the position of relevant tokens (Yu et al., 4 Jun 2024, Goel et al., 13 Dec 2024). Attention weights reveal DPB: shallow layers show “vertical line” attention on boundary tokens; deeper layers bias retrieval toward ends of the context (the “lost in the middle” phenomenon).

Position bias in embeddings is quantified via ablation studies. Insertion or removal of irrelevant text at the beginning of a document typically results in larger deviations in output embeddings (up to 12.3% more reduction in cosine similarity than end ablations) (Goel et al., 13 Dec 2024). Regression analyses demonstrate the monotonic decay of sentence importance by position, even under content-agnostic conditions. This effect is further exacerbated by training pre-processing, where truncation routines favor updates for initial tokens.

4. Mitigation and Adaptation Strategies

Several methods have been developed to address and modulate DPB:

Scaling Positional Hidden States: By identifying and dampening the dimensions of hidden states that encode absolute positional information, attention focus can be rebalanced. This is implemented by scaling selected state dimensions:

$\bar{q}_l = \mathcal{P}(W^q f(h(l), p, s), l) \ \bar{K} = \mathcal{P}(W^k f(h, p, s), 1 \ldots l)$

With $f(\cdot, p, s)$ applying scaling, up to 15.2% performance improvement is attained on position-sensitive benchmarks (Yu et al., 4 Jun 2024).

Data Augmentation and Adapters: PAPEFT (Zhang et al., 1 Apr 2024) combines permutation of candidate orderings (to encourage position-invariance) with a parameter-efficient location encoding adapter integrated as soft prompt tokens, improving both accuracy and uniformity of attention in long-context LLMs.
Prompt Engineering: Explicit prompting techniques (“Focus on the middle documents”) can partially redirect attention to neglected regions, though more structural changes (hierarchical merging, incremental updates) may not improve or may shift DPB elsewhere (Wan et al., 31 Oct 2024). Counterintuitively, explicit positional guidance sometimes reduces accuracy, indicating that models’ internal positional preferences are not always aligned with external instructions (Mikhail et al., 22 May 2025).
Learnable Position Embedding Modulation: Auto-PE (Bruintjes et al., 19 May 2025) allows the model to learn dataset-specific reliance on positional cues, thus supporting adaptation to variable position bias dynamics.

5. DPB Across Modalities and Languages

DPB arises in retrieval, ranking, summarization, vision transformers, and robotics. In search and IR, static or query-specific position bias models have been superseded by architectures like DPIN and doubly-robust estimators, which allow more flexible adjustment to context and user behaviors (Gollapudi et al., 2010, Huang et al., 2021, Oosterhuis, 2022).

In vision, the discriminative role of position depends not only on the camera/view but also scene semantics (e.g. “sky is up”), capture bias, and spatial layout. Position-adaptive embedding strategies are shown to be optimal in context-specific settings (Bruintjes et al., 19 May 2025).

For language, cross-linguistic experiments reveal that positional bias is largely model-driven, modulated by architecture and training data. Models can favor early (lead) or late (recency) positions depending on their internal bias, with positional guidance and predictive entropy showing complex relationships to accuracy and uncertainty (Mikhail et al., 22 May 2025). In free-word-order languages, DPB interacts only weakly with syntactic preferences.

In robotics, DPB is mirrored by parametric bias vectors in DPMPB, where online updating of a low-dimensional bias allows rapid adaptation to physical and environmental changes (objects grasped, floor friction, body aging), without re-tuning global model weights (Kawaharazuka et al., 24 Apr 2024).

6. Theoretical Guarantees and Empirical Impact

The robust theoretical foundations of DPB-aware estimators (doubly-robust methods) offer minimal bias under either correct propensity or regression modeling, and reduced variance compared to pure IPS approaches (Oosterhuis, 2022). These properties are vital for data-efficient, reliable adaptation in dynamic environments.

Experimentally, dynamic modulation of positional signals improves model accuracy on variable-bias datasets, enhances segmentation boundaries in vision, reduces performance variance in long-context LLM tasks, and supports dynamic adaptation in robotics and control. Benchmarks consistently demonstrate the need for architectures and evaluation metrics (such as PAUC) reflecting positional diversity and adaptation.

7. Implications and Future Research Directions

DPB is a critical phenomenon for both practical deployment and core architectural design in models processing sequential input. The optimal strategy is not static; it requires dynamic, data- and task-sensitive adaptation of positional representations. Progress in DPB modeling includes:

Metric development for quantifying and diagnosing DPB in both pretraining and fine-tuning
Learnable or context-conditioned position embedding schemes (Auto-PE, scaling hidden states)
Robust estimation in ranking and feedback models with guaranteed bias and variance properties
Cross-domain transfer and adaptation—particularly in multilingual and cross-modal setups
Applied prompt and manipulation strategies that respect (rather than fight) DPB, with nuanced handling of entropy and uncertainty signals

A plausible implication is that future architectures in search, language, vision, and robotics will increasingly converge on adaptive mechanisms—either end-to-end differentiable modules or hybrid meta-learning strategies—that dynamically regulate positional information flow as a function of context, input, and downstream objectives.