Papers
Topics
Authors
Recent
Search
2000 character limit reached

Minimal Two-Tower Models in LTR

Updated 5 February 2026
  • Minimal two-tower models are additive LTR architectures that use dual towers to separate latent document relevance from systematic presentation bias.
  • They address parameter identifiability by leveraging document swaps and feature overlap, ensuring that model parameters can be uniquely recovered under sufficient conditions.
  • They mitigate logging-policy confounding through sample weighting and inverse-propensity correction to achieve unbiased ranking in click-based systems.

Minimal two-tower models are a class of additive learning-to-rank (@@@@1@@@@) architectures that explicitly address examination bias in user feedback, particularly click data, through the decomposition of observed click probability into a relevance component and a bias component. These models are prevalent in industrial LTR systems due to their conceptual simplicity and theoretical ability to separate latent document relevance from systematic presentation bias (e.g., position bias) using a dual-tower design. Despite their popularity, minimal two-tower models are susceptible to challenges involving parameter identifiability and logging-policy confounding, which can undermine their unbiasedness unless certain conditions are satisfied (Hager et al., 29 Aug 2025).

1. Formal Structure of Additive Two-Tower Models

The minimal additive two-tower model is designed to explain a user's click C{0,1}C\in\{0,1\} on document dd for query %%%%2%%%% at position kk. This model utilizes two neural "towers":

  • Relevance tower r(xq,d;θr)r(x_{q,d};\theta_r): Maps query-document feature vector xq,dRmx_{q,d}\in\mathbb{R}^m to a real-valued logit encoding latent relevance.
  • Bias tower b(k;θb)b(k;\theta_b): Maps contextual information—most commonly the rank kk—to a real-valued logit capturing positional examination or presentation bias.

The probability of a click event is given by the sigmoid of the sum of logits from both towers: P(C=1q,d,k)=σ(b(k;θb)+r(xq,d;θr))P(C=1\mid q,d,k) = \sigma\bigl(b(k;\theta_b) + r(x_{q,d};\theta_r)\bigr) In the basic "position-bias" instantiation, b(k;θb)=θkb(k;\theta_b)=\theta_k, r(xq,d;θr)=γq,dr(x_{q,d};\theta_r)=\gamma_{q,d}, giving

P(C=1q,d,k)=σ(θk+γq,d).P(C=1\mid q,d,k) = \sigma(\theta_k + \gamma_{q,d}).

During inference, ranking is determined solely by the relevance tower.

2. Parameter Identifiability in Two-Tower Models

Identifiability—unique recovery of model parameters from observed data—is not automatically guaranteed in minimal two-tower models.

Definition: A parameterization (θb,θr)(\theta_b, \theta_r) is identifiable if no distinct (θb,θr)(\theta'_b, \theta'_r) yields the same distribution for all observable (q,d,k,C)(q,d,k,C).

Unidentifiability arises when each (q,d)(q,d) appears only at a single rank kk: for any shift Δk\Delta_k,

θk=θk+Δk,γq,d=γq,dΔk,\theta'_k = \theta_k + \Delta_k,\quad \gamma'_{q,d} = \gamma_{q,d} - \Delta_k,

leaves P(C=1q,d,k)P(C=1\mid q,d,k) unchanged, leading to infinitely many equivalent solutions.

Restoring identifiability requires either:

  • Document swaps: If the same (q,d)(q,d) is observed at multiple positions (k,kk, k'), intersection constraints Δk=Δk\Delta_k = \Delta_{k'} arise. Modeling document-rank observations as a graph G=(V,E)G=(V,E)—vertices as ranks, edges between ranks sharing documents—complete connectivity (or connected components under swaps) ensures that the offset ambiguity is resolved up to a constant (setting θ1=0\theta_1=0 standardizes the solution).
  • Feature overlap: When r(xq,d)r(x_{q,d}) is parametrized such that the support of xx under different kk overlaps, continuity enforces ΔkΔk\Delta_k\approx\Delta_{k'} across this domain. Chaining overlaps across GG similarly pins the offsets to a single value, up to negligible error.

Theorem 1 (Identifiability through feature overlap):

Let G=(V,E) be the graph on ranks, with (k,k)E    suppP(xk)suppP(xk).\text{Let }G=(V,E)\text{ be the graph on ranks, with } (k,k')\in E \iff \operatorname{supp}\,P(x\mid k) \cap \operatorname{supp}\,P(x\mid k') \neq \emptyset.

If GG is connected and r()r(\cdot) is continuous, the minimal two-tower model is identifiable up to a single additive constant.

3. Logging-Policy Confounding Effects

Minimal two-tower models are typically trained via the negative log-likelihood, with observed data (q,d,k,C)(q,d,k,C) generated under a logging/display policy π(d,kq)\pi(d,k\mid q): L(θb,θr)=qP(q)d,kπ(d,kq)[clnσ(θk+γq,d)+(1c)ln(1σ(θk+γq,d))].\mathcal{L}(\theta_b, \theta_r) = - \sum_q P(q) \sum_{d,k} \pi(d,k\mid q) \bigl[ c \ln\sigma(\theta_k+\gamma_{q,d}) + (1-c)\ln(1-\sigma(\theta_k+\gamma_{q,d})) \bigr].

Taking derivatives with respect to γq,d\gamma_{q,d} and θk\theta_k yields stationarity conditions: kπ(d,kq)[P(C=1q,d,k)σ(θk+γq,d)]=0,\sum_{k}\pi(d,k\mid q) [P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})] = 0,

q,dπ(d,kq)[P(C=1q,d,k)σ(θk+γq,d)]=0.\sum_{q,d} \pi(d,k\mid q) [P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})] = 0.

  • Lemma 1 (No policy impact on well-specified models): If the model class exactly matches the true click probabilities and is identifiable,

σ(θk+γq,d)=P(C=1q,d,k)\sigma(\theta_k+\gamma_{q,d})=P(C=1\mid q,d,k)

for all supported triplets, and the influence of π\pi cancels—the minimizer does not depend on the policy.

  • Lemma 2 (Policy impact under misspecification): When residuals ϵ(q,d,k)=P(C=1q,d,k)σ(θk+γq,d)0\epsilon(q,d,k)=P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})\ne0, the conditions

kπ(d,kq)ϵ(q,d,k)=0,q,dπ(d,kq)ϵ(q,d,k)=0\sum_k \pi(d,k\mid q)\,\epsilon(q,d,k)=0,\qquad \sum_{q,d} \pi(d,k\mid q)\,\epsilon(q,d,k)=0

cannot in general both be satisfied unless parameters are forced to compensate, inducing bias that depends on correlations between model error and the logging policy.

A plausible implication is that high-performing production systems, whose logging policy is highly non-uniform, can induce significant confounding when minimal two-tower models are trained on resulting click logs and the relevance tower is misspecified.

4. Sample Weighting and Propensity Correction

To address exposure-induced confounding, a standard correction is inverse-propensity weighting in the loss function, directly mitigating the impacts of a non-uniform policy: L^IPS(θb,θr)=1N(q,d,k,c)D1π(d,kq)[clnσ(θk+γq,d)+(1c)ln(1σ(θk+γq,d))].\hat{\mathcal{L}}_{\mathrm{IPS}}(\theta_b,\theta_r) = - \frac{1}{N} \sum_{(q,d,k,c)\in\mathcal{D}} \frac{1}{\pi(d,k\mid q)} \Bigl[ c\,\ln\sigma(\theta_k+\gamma_{q,d}) + (1-c)\ln(1-\sigma(\theta_k+\gamma_{q,d}))\Bigr]. This adjustment "replays" the learning process as if all (q,d)(q,d) pairs appeared uniformly at all possible ranks, thereby eliminating policy-induced error correlations. Propensity scores π(d,kq)\pi(d,k\mid q) are estimated empirically from observed data frequencies.

5. Summary of Key Results

The following table consolidates major statements and conditions established for minimal two-tower models:

Result Condition Significance
Unidentifiability No swaps, no feature overlap Infinitely many parameterizations yield same outputs
Identifiability Theorem Sufficient document swaps or feature-overlap, r()r(\cdot) cont. Model is identifiable up to an additive constant
Lemma 1 - No Policy Impact Perfect model specification, identifiability holds Minimized loss is policy-independent
Lemma 2 - Logging Bias Model misspecification, error-policy correlation Fitted params depend on logging policy
Sample-weighting Correction Access to propensities π(d,kq)\pi(d,k|q) Recovers target estimand under arbitrary policy

These results collectively show that unbiased LTR using minimal two-tower models critically depends on identifiability and careful treatment of logged user policy effects.

6. Best Practices for Practical Implementation

  • Check identifiability: Introduce randomization or document swaps across positions when feasible. If operational constraints prevent randomization, promote feature overlap across ranks via dimensionality reduction, shallow model architecture, or other means to ensure shared support.
  • Monitor residuals: Following estimation, compute ϵ(q,d,k)=cσ(θ^k+γ^q,d)\epsilon(q,d,k)=c-\sigma(\hat{\theta}_k+\hat{\gamma}_{q,d}). Examine correlations between residuals and rank or bias features to detect unmodeled confounding.
  • Avoid deterministic expert-label simulations: When generating synthetic experiments, do not sort records strictly by ground-truth labels, as this introduces confounding not addressed by the model.
  • Mitigate misspecification: Employ maximally expressive relevance towers and, when possible, incorporate features used in the logging/display policy to reduce omitted variable bias.
  • Regularize for continuity: When relying on overlapping features to enable identifiability, application of regularization helps preserve the necessary continuity in r()r(\cdot).
  • Implement propensity reweighting: When model misspecification cannot be eliminated, utilize inverse-propensity sample weights as detailed above.
  • Offline validation: Compare estimated position-bias parameters from the model to those produced by randomized "intervention-harvesting" methods. Divergence between the two suggests remnant bias from model or policy design choices.

The above guidelines support use of minimal two-tower models for unbiased learning-to-rank under realistic industrial feedback regimes.

7. Context and Implications

Additive two-tower models are, in principle, sufficient to fully correct for position bias in click-based LTR. However, their efficacy in practice hinges on whether identifiability conditions are realized in the data collection process, and whether the model family is sufficiently expressive to capture user behavior. Logging-policy confounding, particularly under highly non-uniform, productionized data collection, introduces vulnerabilities when these conditions are violated. The proper implementation of randomization, feature overlap, residual monitoring, and inverse-propensity correction is essential to the faithful realization of unbiased learning in deployed feedback-driven LTR pipelines (Hager et al., 29 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimal Two-Tower Models.