Unbiased User Interaction Histories

Updated 19 February 2026

Unbiased UIH is a systematic record of user-item feedback with rigorously controlled exposure probabilities, reducing selection, presentation, and position biases.
They apply methods like counterfactual corrections (e.g., IPS) and online randomization techniques (e.g., Thompson sampling) to achieve unbiased estimations.
Practical applications include off-policy evaluation, learning-to-rank, and training robust recommendation systems with enhanced fairness and accuracy.

An unbiased user interaction history (UIH) is a record of user–item feedback events (e.g., clicks, views, dwell times) in which the probabilities governing item exposure, user action, and feedback logging are rigorously controlled or post-processed such that downstream estimators—of user preferences, model performance, or causal effect—are not confounded by systemic data collection biases. The main sources of bias include position, selection, presentation, and exposure bias, each of which leads to a logged dataset that does not reflect true, underlying user preference distributions. A UIH is considered unbiased if, for any unbiased estimator (typically importance-weighted), its expectation over the history matches the true target quantity, enabling valid off-policy learning, evaluation, and debiasing.

1. Bias in Logged User Interaction Histories

Statistical and behavioral biases are intrinsic to logs from recommender and search systems. Key forms include:

Position bias: Higher-ranked items are more likely to be seen and clicked—even if less relevant; this skews logged feedback toward what the ranking policy favors (Oosterhuis et al., 2019).
Selection bias: Only shown items can be acted upon, so any downstream learning or policy evaluation inherits the bias of the exposure mechanism (Hong et al., 2016).
Exposure bias: Items with high initial estimates reinforce their exposure probability, starving others of exploration and leading to missing-not-at-random feedback (Gao et al., 2022, Chen et al., 11 Dec 2025).
Presentation or trust bias: UI design, snippet formulation, and contextual signals further inflate or suppress interaction propensity regardless of intrinsic utility.

Collectively, these biases, if uncorrected, induce feedback loops and mode collapse in learned models, thereby necessitating both debiasing frameworks and systematic data collection interventions.

2. Counterfactual and Online Approaches to Unbiased UIH

There are two main paradigms for UIH unbiasedness:

A. Counterfactual (Offline) Correction

Counterfactual methods rely on logged data and reweight observed events to produce unbiased estimators using Inverse Propensity Scoring (IPS) (Oosterhuis et al., 2019, Ai et al., 2020):

$\hat R_{IPS}(f) = \frac{1}{n} \sum_{i=1}^n \sum_{j=1}^{m_i} \frac{\delta_{ij}}{p_{ij}} \ell(f(x_i), y_{ij})$

where $\delta_{ij}=1$ if the document was observed/clicked, and $p_{ij}$ is the estimated propensity (e.g., position-dependent). Self-normalized (SNIPS) and doubly robust (DR) extensions provide improved variance control and robustness.

B. Online (Interventional) UIH Generation

Online approaches introduce randomization or explicit interventions into live ranking or recommendation policies to log interactions that are inherently unbiased by design. Techniques include:

Thompson sampling ranked-list bandits: At each round, scores are sampled from posteriors, and items are selected proportionally, ensuring coverage and known selection probabilities (Hong et al., 2016).
Random exposure interventions: Items are inserted randomly (uniform over the candidate pool), creating a Missing-At-Random (MAR) interaction log suitable for IPS (Gao et al., 2022).
Bandit-based exploration in layout: Dedicated UI containers deliver randomized content at empirically low-cost locations, allowing large-scale unbiased logging without sacrificing core engagement (Chen et al., 11 Dec 2025).
Pairwise Differentiable Gradient Descent (PDGD): Rankings are sampled from parameterized stochastic policies (Plackett–Luce), and gradients are constructed from click-inferred pairwise preferences, yielding an unbiased online learning signal (Oosterhuis et al., 2018, Ai et al., 2020).

3. Practical Methods for Producing Unbiased UIH

3.1. Bernoulli Rank-List Thompson Sampling

Hong & Boz describe a framework wherein each item's click probability $\theta_i$ is modeled with a Beta prior, updated incrementally, and contexts are logged with the known exposure probability $p_{t,i}$ (Hong et al., 2016). Inverse-propensity estimators exploit these probabilities for unbiased downstream model training:

Each interaction log includes $(x_t, i, r_{t,i}, p_{t,i})$ .
Models are trained or evaluated using importance weighting: $\text{weight} = 1/p_{t,i}$ .

3.2. Missing-At-Random (MAR) Interventions

KuaiRand implements explicit random insertions of candidate items into the recommendation list, with known, strictly positive exposure propensities $p_{u,i} = \pi/K$ , producing a MAR dataset (Gao et al., 2022). All feedback (12 signals per exposure) is then used in typical IPS estimators or sequence modeling with debiasing:

$\hat R_{IPS}(\theta) = \frac{1}{|\mathcal{D}_{all}|} \sum_{u,i:E_{u,i}=1} \frac{\ell(f_\theta(x_{u,i}), Y_{u,i})}{p_{u,i}}$

3.3. Exploration Placement for Safe, Scalable UIH

Deployment-scale systems may place exploration containers ("Something Completely Different" rows) at high-reach, low-cost UI positions, as determined by constrained optimization over reach and engagement cost, yielding interaction logs with flat popularity distributions and minimal disruption to user metrics (Chen et al., 11 Dec 2025).

3.4. Federated Unbiased UIH

In privacy-sensitive settings, each user's device logs local interaction histories and corrects for click/exposure propensity using local propensity estimates, optimizing an IPS-weighted surrogate without sharing raw logs (Li et al., 2021).

3.5. Synthetic Unbiased UIH

Synthetic pipelines construct UIHs by random-walk sampling over item–item graphs derived from collaborative filtering signals, completely eliminating exposure and position bias present in real data. These synthetic UIHs enable predictable LLM scaling behavior and outperform real-data–trained sequential models in downstream ranking (Zhang et al., 7 Feb 2026).

4. Applications in Recommendation, Learning-to-Rank, and Evaluation

UIHs enable a broad spectrum of unbiased learning and evaluation protocols:

Off-policy evaluation (OPE): Policies can be compared offline via IPS/SNIPS estimators utilizing unbiased logs (Gao et al., 2022).
Learning-to-Rank: Counterfactual LTR (IPS, DLA, Regression-EM) and OLTR (PDGD, DBGD) directly use unbiased UIHs for robust model training (Oosterhuis et al., 2019, Ai et al., 2020).
Debiasing sequential models: Sequence models (DIN, SASRec, BERT4Rec) are trained on unbiased or denoised UIHs for more generalizable user preference estimation (Gao et al., 2022, Xin et al., 28 May 2025).
Candidate generation: Unbiased logs power co-occurrence-based recallers that improve engagement when reintegrated into retrieval pipelines (Chen et al., 11 Dec 2025).
Multi-task learning and user profiling: Full fidelity UIH logs (e.g., with 12 feedback signals) support nuanced multitask, long-sequence, and representation-learning scenarios (Gao et al., 2022).

5. Quantitative Evaluation and Empirical Findings

Empirical validation of UIH methodologies demonstrates:

Intervention	Unbiasedness Mechanism	Bias Reduction / Engagement Lift
Thompson sampling rank-list	Posterior sampling, IPS logging	View-skewness –30%; CTR-skew –41% (Hong et al., 2016)
KuaiRand MAR intervention	Uniform randomization	Validates OPE; supports sequential models (Gao et al., 2022)
Scroll-depth exploration row	High-reach, low-cost insertion	Gini coefficient (expl.) 0.203 vs 0.494 (base); engagement +0.28%, p=0.062 (Chen et al., 11 Dec 2025)
Synthetic UIH	Controlled random walks, no exposure artifacts	+131% Recall@100 (SASRec, TSTR) (Zhang et al., 7 Feb 2026)
ConsRec denoised histories	Semantic graph filtering	Recall@10 +21%; NDCG@10 +24% vs. baseline (Xin et al., 28 May 2025)

A plausible implication is that unbiased UIH—whether through intervention, counterfactual methods, or principled synthetic generation—not only debiases learning and evaluation, but can substantially improve recommendation coverage, cold-start performance, downstream ranking quality, and predictability of scaling laws.

6. Methodological Guidelines, Limitations, and Extensions

Robust UIH production and use require:

Explicit tracking/logging of exposure propensities for all presented items (Hong et al., 2016).
Periodic reevaluation of randomization parameters (row placement, intervention frequency, exploration pool) to maintain engagement and statistical coverage (Chen et al., 11 Dec 2025).
Clipping or normalization of importance weights to control estimator variance (Gao et al., 2022, Oosterhuis et al., 2019).
Graph-based denoising and similarity filtering to eliminate non-preferential noise from UIH (Xin et al., 28 May 2025).
Propensity estimation from minimal randomized exposure, with reestimation in case of policy drift (Ai et al., 2020, Li et al., 2021).
Federated methods for privacy-preserving unbiased UIH where raw logs cannot be exported (Li et al., 2021).
Curriculum design and synthetic data construction for LLMs, ensuring elimination of position/exposure artifacts (Zhang et al., 7 Feb 2026).

Limitations include sensitivity to misspecification of propensity models (especially in counterfactual approaches), potential user-experience degradation from heavy exploration, and scalability constraints in coverage of very large item pools.

7. Future Directions and Synthesis

Research on unbiased UIH is converging toward modular pipelines that unify interventional data collection, rigorous statistical debiasing, privacy-preserving computation, and curriculum design for next-generation models:

Systematic synthesis and curriculum-based UIH enable controlled scaling and privacy guarantees (Zhang et al., 7 Feb 2026).
Advanced federated architectures promise on-device unbiased learning without raw log centralization (Li et al., 2021).
Multi-modal, multi-feedback UIHs (e.g., KuaiRand's 12-signal logs) offer a testbed for complex, unbiased sequential models (Gao et al., 2022).
Robustness to real-world nonstationarities and behavioral changes remains an open challenge, motivating dynamic adaptation and hybrid debiasing mechanisms.

The unbiased UIH is now a foundational construct across recommendation, ranking, and sequential modeling, underpinning developments in evaluation methodologies, fairness-aware personalization, and predictable scaling in LLM-based recommendation systems.