Minimal Two-Tower Models in LTR
- Minimal two-tower models are additive LTR architectures that use dual towers to separate latent document relevance from systematic presentation bias.
- They address parameter identifiability by leveraging document swaps and feature overlap, ensuring that model parameters can be uniquely recovered under sufficient conditions.
- They mitigate logging-policy confounding through sample weighting and inverse-propensity correction to achieve unbiased ranking in click-based systems.
Minimal two-tower models are a class of additive learning-to-rank (@@@@1@@@@) architectures that explicitly address examination bias in user feedback, particularly click data, through the decomposition of observed click probability into a relevance component and a bias component. These models are prevalent in industrial LTR systems due to their conceptual simplicity and theoretical ability to separate latent document relevance from systematic presentation bias (e.g., position bias) using a dual-tower design. Despite their popularity, minimal two-tower models are susceptible to challenges involving parameter identifiability and logging-policy confounding, which can undermine their unbiasedness unless certain conditions are satisfied (Hager et al., 29 Aug 2025).
1. Formal Structure of Additive Two-Tower Models
The minimal additive two-tower model is designed to explain a user's click on document for query %%%%2%%%% at position . This model utilizes two neural "towers":
- Relevance tower : Maps query-document feature vector to a real-valued logit encoding latent relevance.
- Bias tower : Maps contextual information—most commonly the rank —to a real-valued logit capturing positional examination or presentation bias.
The probability of a click event is given by the sigmoid of the sum of logits from both towers: In the basic "position-bias" instantiation, , , giving
During inference, ranking is determined solely by the relevance tower.
2. Parameter Identifiability in Two-Tower Models
Identifiability—unique recovery of model parameters from observed data—is not automatically guaranteed in minimal two-tower models.
Definition: A parameterization is identifiable if no distinct yields the same distribution for all observable .
Unidentifiability arises when each appears only at a single rank : for any shift ,
leaves unchanged, leading to infinitely many equivalent solutions.
Restoring identifiability requires either:
- Document swaps: If the same is observed at multiple positions (), intersection constraints arise. Modeling document-rank observations as a graph —vertices as ranks, edges between ranks sharing documents—complete connectivity (or connected components under swaps) ensures that the offset ambiguity is resolved up to a constant (setting standardizes the solution).
- Feature overlap: When is parametrized such that the support of under different overlaps, continuity enforces across this domain. Chaining overlaps across similarly pins the offsets to a single value, up to negligible error.
Theorem 1 (Identifiability through feature overlap):
If is connected and is continuous, the minimal two-tower model is identifiable up to a single additive constant.
3. Logging-Policy Confounding Effects
Minimal two-tower models are typically trained via the negative log-likelihood, with observed data generated under a logging/display policy :
Taking derivatives with respect to and yields stationarity conditions:
- Lemma 1 (No policy impact on well-specified models): If the model class exactly matches the true click probabilities and is identifiable,
for all supported triplets, and the influence of cancels—the minimizer does not depend on the policy.
- Lemma 2 (Policy impact under misspecification): When residuals , the conditions
cannot in general both be satisfied unless parameters are forced to compensate, inducing bias that depends on correlations between model error and the logging policy.
A plausible implication is that high-performing production systems, whose logging policy is highly non-uniform, can induce significant confounding when minimal two-tower models are trained on resulting click logs and the relevance tower is misspecified.
4. Sample Weighting and Propensity Correction
To address exposure-induced confounding, a standard correction is inverse-propensity weighting in the loss function, directly mitigating the impacts of a non-uniform policy: This adjustment "replays" the learning process as if all pairs appeared uniformly at all possible ranks, thereby eliminating policy-induced error correlations. Propensity scores are estimated empirically from observed data frequencies.
5. Summary of Key Results
The following table consolidates major statements and conditions established for minimal two-tower models:
| Result | Condition | Significance |
|---|---|---|
| Unidentifiability | No swaps, no feature overlap | Infinitely many parameterizations yield same outputs |
| Identifiability Theorem | Sufficient document swaps or feature-overlap, cont. | Model is identifiable up to an additive constant |
| Lemma 1 - No Policy Impact | Perfect model specification, identifiability holds | Minimized loss is policy-independent |
| Lemma 2 - Logging Bias | Model misspecification, error-policy correlation | Fitted params depend on logging policy |
| Sample-weighting Correction | Access to propensities | Recovers target estimand under arbitrary policy |
These results collectively show that unbiased LTR using minimal two-tower models critically depends on identifiability and careful treatment of logged user policy effects.
6. Best Practices for Practical Implementation
- Check identifiability: Introduce randomization or document swaps across positions when feasible. If operational constraints prevent randomization, promote feature overlap across ranks via dimensionality reduction, shallow model architecture, or other means to ensure shared support.
- Monitor residuals: Following estimation, compute . Examine correlations between residuals and rank or bias features to detect unmodeled confounding.
- Avoid deterministic expert-label simulations: When generating synthetic experiments, do not sort records strictly by ground-truth labels, as this introduces confounding not addressed by the model.
- Mitigate misspecification: Employ maximally expressive relevance towers and, when possible, incorporate features used in the logging/display policy to reduce omitted variable bias.
- Regularize for continuity: When relying on overlapping features to enable identifiability, application of regularization helps preserve the necessary continuity in .
- Implement propensity reweighting: When model misspecification cannot be eliminated, utilize inverse-propensity sample weights as detailed above.
- Offline validation: Compare estimated position-bias parameters from the model to those produced by randomized "intervention-harvesting" methods. Divergence between the two suggests remnant bias from model or policy design choices.
The above guidelines support use of minimal two-tower models for unbiased learning-to-rank under realistic industrial feedback regimes.
7. Context and Implications
Additive two-tower models are, in principle, sufficient to fully correct for position bias in click-based LTR. However, their efficacy in practice hinges on whether identifiability conditions are realized in the data collection process, and whether the model family is sufficiently expressive to capture user behavior. Logging-policy confounding, particularly under highly non-uniform, productionized data collection, introduces vulnerabilities when these conditions are violated. The proper implementation of randomization, feature overlap, residual monitoring, and inverse-propensity correction is essential to the faithful realization of unbiased learning in deployed feedback-driven LTR pipelines (Hager et al., 29 Aug 2025).