PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation

Published 20 May 2026 in cs.LG and cs.AI | (2605.21752v1)

Abstract: Recommender systems trained on user interaction data are susceptible to behavioral intensity imbalance--a systematic distortion arising from heterogeneous engagement patterns across users. This imbalance skews feedback signals such that observed interactions no longer faithfully reflect true preferences, causing models to disproportionately amplify signals from highly active users while underrepresenting others, which ultimately degrades recommendation quality and robustness at scale. To address this issue, we propose a nonparametric contrastive percentile approximation framework, PEARL, that models relative preference signals instead of absolute engagement magnitudes. Building upon relative advantage debiasing, PEARL leverages real contrastive interaction samples to approximate percentile relationships directly, without relying on auxiliary distribution estimation models. We provide theoretical justification demonstrating that such pairwise comparisons yield unbiased estimates of percentile-based preference signals. For broader applicability, we introduce a prediction-based bootstrapping mechanism for percentile smoothing to handle sparse and discrete feedback, alongside a generalized value-weighted formulation and a co-training strategy to enhance both modeling flexibility and representation learning. Extensive offline experiments demonstrate that PEARL effectively mitigates behavioral bias and consistently improves recommendation performance across multiple ranking targets. Deployed in a production livestream platform with a combined user base of billions, online A/B testing confirms substantial real-world gains: +2.10% Watch Duration, +0.80% Consumption Amount, +1.49% Interaction Rate, and -6.91% Report Rate.

Abstract PDF Upgrade to Chat

Authors (9)

Summary

The paper introduces a contrastive percentile estimation method, using user-specific reservoir sampling to counteract skewed engagement data.
It integrates value-weighted and bootstrapped loss extensions to co-optimize ranking fidelity and reward calibration in real-world settings.
Empirical evaluations in offline and online tests reveal improved UAUC, balanced low-activity user performance, and scalable system integration.

PEARL: Unbiased Percentile Estimation via Contrastive Learning for Industrial-Scale Livestream Recommendation

Introduction and Problem Context

In large-scale industrial recommender systems, user engagement data is structurally skewed: a minority of highly active users generate a disproportionate volume of feedback, causing overrepresentation of their preferences and degrading overall model robustness. This "behavioral intensity imbalance" propagates user-side bias, resulting in models that amplify high-activity users’ signals while underrepresenting less active populations. This is particularly problematic in livestreaming platforms, where engagement dynamics are distinct from other modalities such as short-form videos.

Traditional debiasing strategies—including counterfactual learning, propensity reweighting, and representation-level disentanglement—treat bias as a global property and largely ignore sharply heterogeneous user activity. Recent relative preference modeling approaches like RAD convert absolute engagement into conditional percentiles but depend on auxiliary CDF models, introducing additional estimation complexity.

PEARL directly confronts these limitations by proposing a contrastive, user-centric, nonparametric percentile estimation framework. The method leverages real pairwise historical samples and a multi-sample contrastive loss to provide unbiased, scalable, and robust per-user preference signals, fundamentally improving the fidelity of livestream recommendation outcomes.

Figure 1: The PEARL architecture, with a user-keyed reservoir pool for unbiased sampling, and multi-sample contrastive learning for percentile loss computation.

Methodological Innovations

Contrastive Percentile Estimation

PEARL’s core contribution is casting percentile estimation as a series of stochastic contrastive decisions. For each user-item engagement $y$ , the framework samples multiple historical engagement values $\{Y'_i\}$ from the same user’s reservoir. The model then learns to predict the probability that $y > Y'_i$ for each reference, optimizing a multi-sample binary cross-entropy loss. This yields an unbiased estimate of the empirical percentile of $y$ under the user’s behavioral distribution while circumventing expensive explicit distribution estimation.

This design is grounded in a proof that the expectation of this binary indicator aligns with the exact CDF percentile, removing the requirement for parametric modeling of user-wise engagement distributions. A multi-sample extension further reduces estimation variance, stabilizing training even in billion-scale streaming contexts via significant noise suppression ( $N$ -sample averaging).

Value-Weighted and Bootstrapped Extensions

To handle cases where target value magnitude carries economic or engagement significance, PEARL generalizes its loss to incorporate value weights, aligning the learned signal with cumulative reward. For discrete or highly sparse targets (e.g., rare gifts, reports), prediction-based bootstrapping smooths the percentile estimation by using prior model predictions rather than degenerate binary labels, yielding differentiable, informative training signals.

Co-Training with Regression Heads

In production, both ranking and value calibration are often required (e.g., auction scoring, explicit revenue optimization). PEARL integrates a dual-head co-training objective, jointly optimizing for debiased percentile ranking and absolute magnitude. This anchors the learned representations, preserving both personalized ranking fidelity and the scale calibration indispensable for real-world operations.

Efficient System Integration

PEARL implements a user-keyed reservoir sampling pool in-graph for real-time unbiased sampling, ensuring user-wide historical coverage with minimal footprint and latency. A dynamic gradient gating mechanism further suppresses high-variance updates from insufficiently populated reservoirs, stabilizing convergence.

Experimental Results

Offline Evaluation

Offline experiments on industrial-scale datasets reveal robust improvements over established baselines (regression, CQE, RAD) across targets:

Watch Time UAUC: PEARL achieves 0.648, surpassing original regression (0.641), CQE (0.615), and RAD (0.625).
Binary targets (e.g., report): PEARL with prediction-based bootstrapping provides significant UAUC gains (+4.96%) compared to classification baselines.
Long-term engagement targets: Co-training with percentile modeling yields the most pronounced improvement (+16.1% in UAUC).
Figure 2: User-Averaged AUC comparison by user activity level shows PEARL’s marked advantage among low- and non-active cohorts.

Cohort analysis by user activity demonstrates that baseline regression exhibits strong performance skewed toward high-activity users; performance on the non-live cohort is near random. PEARL, by contrast, delivers balanced improvements, particularly excelling among low-activity users who constitute the bulk of the platform’s user base.

Online A/B Testing

Deployment in a live production environment (billions of users, week-long experiments) confirms PEARL’s real-world impact:

Watch Duration: +2.1%
Consumption Amount: +0.80%
Interaction Rate: +1.49%
Report Rate: -6.91%

These are highly material improvements in industrial recommender KPIs, attributed to more accurate modeling of each user’s individual behavioral context and attenuation of intensity-driven bias. Notably, the reduction in report rate indicates improved user satisfaction and lower incidence of negative feedback.

Implications and Future Prospects

PEARL demonstrates that implicit behavioral debiasing via contrastive percentile learning is both theoretically principled and practically scalable. The approach obviates the need for auxiliary distributional models while capturing per-user preference nuance, making it broadly relevant for platforms with highly heterogeneous activity profiles. Direct modeling of relative behavior appears critical for robust personalization, especially as platforms diversify modalities and user bases.

Theory suggests broad applicability of this approach beyond livestreaming to any context where engagement bias distorts sample distributions: e-commerce, social feeds, and personalized advertising. Future extensions could include dynamic adaptation of reservoir pool sizes, more granular context-dependent contrastive sampling, or integrating contrastive percentile estimation with large-scale generative recommenders and sequential transducers.

Conclusion

PEARL introduces a robust, theoretically justified, and computationally efficient framework for debiasing behavioral intensity in industrial recommender systems using nonparametric contrastive percentile estimation. Its combination of practical architecture and empirical performance establishes a new paradigm for bias-corrected user modeling, with substantial implications for personalized, fair, and scalable recommendation at production scale (2605.21752).

Markdown Report Issue