Minimal Two-Tower Models in LTR

Updated 5 February 2026

Minimal two-tower models are additive LTR architectures that use dual towers to separate latent document relevance from systematic presentation bias.
They address parameter identifiability by leveraging document swaps and feature overlap, ensuring that model parameters can be uniquely recovered under sufficient conditions.
They mitigate logging-policy confounding through sample weighting and inverse-propensity correction to achieve unbiased ranking in click-based systems.

Minimal two-tower models are a class of additive learning-to-rank (@@@@1@@@@) architectures that explicitly address examination bias in user feedback, particularly click data, through the decomposition of observed click probability into a relevance component and a bias component. These models are prevalent in industrial LTR systems due to their conceptual simplicity and theoretical ability to separate latent document relevance from systematic presentation bias (e.g., position bias) using a dual-tower design. Despite their popularity, minimal two-tower models are susceptible to challenges involving parameter identifiability and logging-policy confounding, which can undermine their unbiasedness unless certain conditions are satisfied (Hager et al., 29 Aug 2025).

1. Formal Structure of Additive Two-Tower Models

The minimal additive two-tower model is designed to explain a user's click $C\in\{0,1\}$ on document $d$ for query %%%%2%%%% at position $k$ . This model utilizes two neural "towers":

Relevance tower $r(x_{q,d};\theta_r)$ : Maps query-document feature vector $x_{q,d}\in\mathbb{R}^m$ to a real-valued logit encoding latent relevance.
Bias tower $b(k;\theta_b)$ : Maps contextual information—most commonly the rank $k$ —to a real-valued logit capturing positional examination or presentation bias.

The probability of a click event is given by the sigmoid of the sum of logits from both towers: $P(C=1\mid q,d,k) = \sigma\bigl(b(k;\theta_b) + r(x_{q,d};\theta_r)\bigr)$ In the basic "position-bias" instantiation, $b(k;\theta_b)=\theta_k$ , $r(x_{q,d};\theta_r)=\gamma_{q,d}$ , giving

$P(C=1\mid q,d,k) = \sigma(\theta_k + \gamma_{q,d}).$

During inference, ranking is determined solely by the relevance tower.

2. Parameter Identifiability in Two-Tower Models

Identifiability—unique recovery of model parameters from observed data—is not automatically guaranteed in minimal two-tower models.

Definition: A parameterization $(\theta_b, \theta_r)$ is identifiable if no distinct $(\theta'_b, \theta'_r)$ yields the same distribution for all observable $(q,d,k,C)$ .

Unidentifiability arises when each $(q,d)$ appears only at a single rank $k$ : for any shift $\Delta_k$ ,

$\theta'_k = \theta_k + \Delta_k,\quad \gamma'_{q,d} = \gamma_{q,d} - \Delta_k,$

leaves $P(C=1\mid q,d,k)$ unchanged, leading to infinitely many equivalent solutions.

Restoring identifiability requires either:

Document swaps: If the same $(q,d)$ is observed at multiple positions ( $k, k'$ ), intersection constraints $\Delta_k = \Delta_{k'}$ arise. Modeling document-rank observations as a graph $G=(V,E)$ —vertices as ranks, edges between ranks sharing documents—complete connectivity (or connected components under swaps) ensures that the offset ambiguity is resolved up to a constant (setting $\theta_1=0$ standardizes the solution).
Feature overlap: When $r(x_{q,d})$ is parametrized such that the support of $x$ under different $k$ overlaps, continuity enforces $\Delta_k\approx\Delta_{k'}$ across this domain. Chaining overlaps across $G$ similarly pins the offsets to a single value, up to negligible error.

Theorem 1 (Identifiability through feature overlap):

$\text{Let }G=(V,E)\text{ be the graph on ranks, with } (k,k')\in E \iff \operatorname{supp}\,P(x\mid k) \cap \operatorname{supp}\,P(x\mid k') \neq \emptyset.$

If $G$ is connected and $r(\cdot)$ is continuous, the minimal two-tower model is identifiable up to a single additive constant.

3. Logging-Policy Confounding Effects

Minimal two-tower models are typically trained via the negative log-likelihood, with observed data $(q,d,k,C)$ generated under a logging/display policy $\pi(d,k\mid q)$ : $\mathcal{L}(\theta_b, \theta_r) = - \sum_q P(q) \sum_{d,k} \pi(d,k\mid q) \bigl[ c \ln\sigma(\theta_k+\gamma_{q,d}) + (1-c)\ln(1-\sigma(\theta_k+\gamma_{q,d})) \bigr].$

Taking derivatives with respect to $\gamma_{q,d}$ and $\theta_k$ yields stationarity conditions: $\sum_{k}\pi(d,k\mid q) [P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})] = 0,$

$\sum_{q,d} \pi(d,k\mid q) [P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})] = 0.$

Lemma 1 (No policy impact on well-specified models): If the model class exactly matches the true click probabilities and is identifiable,

$\sigma(\theta_k+\gamma_{q,d})=P(C=1\mid q,d,k)$

for all supported triplets, and the influence of $\pi$ cancels—the minimizer does not depend on the policy.

Lemma 2 (Policy impact under misspecification): When residuals $\epsilon(q,d,k)=P(C=1\mid q,d,k) - \sigma(\theta_k+\gamma_{q,d})\ne0$ , the conditions

$\sum_k \pi(d,k\mid q)\,\epsilon(q,d,k)=0,\qquad \sum_{q,d} \pi(d,k\mid q)\,\epsilon(q,d,k)=0$

cannot in general both be satisfied unless parameters are forced to compensate, inducing bias that depends on correlations between model error and the logging policy.

A plausible implication is that high-performing production systems, whose logging policy is highly non-uniform, can induce significant confounding when minimal two-tower models are trained on resulting click logs and the relevance tower is misspecified.

4. Sample Weighting and Propensity Correction

To address exposure-induced confounding, a standard correction is inverse-propensity weighting in the loss function, directly mitigating the impacts of a non-uniform policy: $\hat{\mathcal{L}}_{\mathrm{IPS}}(\theta_b,\theta_r) = - \frac{1}{N} \sum_{(q,d,k,c)\in\mathcal{D}} \frac{1}{\pi(d,k\mid q)} \Bigl[ c\,\ln\sigma(\theta_k+\gamma_{q,d}) + (1-c)\ln(1-\sigma(\theta_k+\gamma_{q,d}))\Bigr].$ This adjustment "replays" the learning process as if all $(q,d)$ pairs appeared uniformly at all possible ranks, thereby eliminating policy-induced error correlations. Propensity scores $\pi(d,k\mid q)$ are estimated empirically from observed data frequencies.

5. Summary of Key Results

The following table consolidates major statements and conditions established for minimal two-tower models:

Result	Condition	Significance
Unidentifiability	No swaps, no feature overlap	Infinitely many parameterizations yield same outputs
Identifiability Theorem	Sufficient document swaps or feature-overlap, $r(\cdot)$ cont.	Model is identifiable up to an additive constant
Lemma 1 - No Policy Impact	Perfect model specification, identifiability holds	Minimized loss is policy-independent
Lemma 2 - Logging Bias	Model misspecification, error-policy correlation	Fitted params depend on logging policy
Sample-weighting Correction	Access to propensities $\pi(d,k\|q)$	Recovers target estimand under arbitrary policy

These results collectively show that unbiased LTR using minimal two-tower models critically depends on identifiability and careful treatment of logged user policy effects.

6. Best Practices for Practical Implementation

Check identifiability: Introduce randomization or document swaps across positions when feasible. If operational constraints prevent randomization, promote feature overlap across ranks via dimensionality reduction, shallow model architecture, or other means to ensure shared support.
Monitor residuals: Following estimation, compute $\epsilon(q,d,k)=c-\sigma(\hat{\theta}_k+\hat{\gamma}_{q,d})$ . Examine correlations between residuals and rank or bias features to detect unmodeled confounding.
Avoid deterministic expert-label simulations: When generating synthetic experiments, do not sort records strictly by ground-truth labels, as this introduces confounding not addressed by the model.
Mitigate misspecification: Employ maximally expressive relevance towers and, when possible, incorporate features used in the logging/display policy to reduce omitted variable bias.
Regularize for continuity: When relying on overlapping features to enable identifiability, application of regularization helps preserve the necessary continuity in $r(\cdot)$ .
Implement propensity reweighting: When model misspecification cannot be eliminated, utilize inverse-propensity sample weights as detailed above.
Offline validation: Compare estimated position-bias parameters from the model to those produced by randomized "intervention-harvesting" methods. Divergence between the two suggests remnant bias from model or policy design choices.

The above guidelines support use of minimal two-tower models for unbiased learning-to-rank under realistic industrial feedback regimes.

7. Context and Implications

Additive two-tower models are, in principle, sufficient to fully correct for position bias in click-based LTR. However, their efficacy in practice hinges on whether identifiability conditions are realized in the data collection process, and whether the model family is sufficiently expressive to capture user behavior. Logging-policy confounding, particularly under highly non-uniform, productionized data collection, introduces vulnerabilities when these conditions are violated. The proper implementation of randomization, feature overlap, residual monitoring, and inverse-propensity correction is essential to the faithful realization of unbiased learning in deployed feedback-driven LTR pipelines (Hager et al., 29 Aug 2025).

Markdown Upgrade to Chat

References (1)

Unidentified and Confounded? Understanding Two-Tower Models for Unbiased Learning to Rank (Extended Abstract) (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Minimal Two-Tower Models.

Minimal Two-Tower Models in LTR

1. Formal Structure of Additive Two-Tower Models

2. Parameter Identifiability in Two-Tower Models

3. Logging-Policy Confounding Effects

4. Sample Weighting and Propensity Correction

5. Summary of Key Results

6. Best Practices for Practical Implementation

7. Context and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Minimal Two-Tower Models in LTR

1. Formal Structure of Additive Two-Tower Models

2. Parameter Identifiability in Two-Tower Models

3. Logging-Policy Confounding Effects

4. Sample Weighting and Propensity Correction

5. Summary of Key Results

6. Best Practices for Practical Implementation

7. Context and Implications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research