READER: Dual-Branch Repurchase Predictor
- The paper introduces a dual-branch architecture with repurchase-aware routing and dynamic calibration to accurately predict GMV in online ads.
- It leverages online streaming training with pseudo-final labels to mitigate delayed feedback and address challenges in label drift.
- Empirical results on the TRACE benchmark show improved AUC and accuracy, highlighting effective separation of single-purchase and repurchase cases.
Repurchase-Aware Dual-branch Predictor (READER) is a modeling paradigm designed to address the delayed feedback challenge in post-click gross merchandise volume (GMV) prediction, particularly in online advertisement ranking scenarios where a single click may result in multiple, temporally dispersed purchases. Unlike conventional conversion rate (CVR) estimation which considers binary conversion events, GMV prediction requires capturing a continuous target—summed transaction values—which complicates model training and label acquisition under delayed and multi-event feedback. READER introduces a dual-branch architecture, a repurchase-aware router, and dynamic label calibration mechanisms; these are integrated to optimize online, streaming prediction of GMV within realistic attribution windows.
1. Problem Formulation and Context
The GMV prediction problem is formulated as follows: given an ad click at time and embedding , the system observes a sequence of purchases
where denotes the -th transaction price and is the count of purchases in the attribution window . The ultimate target label is the total GMV per click,
whereas at any intermediate time , only the partial sum
0
is available. The objective is to train a regressor 1 in a streaming, online manner to predict 2 as accurately as possible, updating F on every new purchase event. This setup is complicated by label drift, incomplete feedback, and the stark distributional differences between single-purchase and repurchase scenarios, as documented in the TRACE benchmark (Li et al., 28 Jan 2026).
2. Model Architecture
READER employs a shared-bottom dual-expert structure with a lightweight router for purchase type discrimination:
- The input embedding 3 is processed through a shared network:
4
yielding feature 5.
- Two specialist MLPs, 6 and 7, generate expert predictions for single-purchase (8) and repurchase (9) cases.
- The router 0 predicts the probability 1 that a click will yield multiple purchases:
2
where 3 is a temperature and 4 is the sigmoid.
- The final predicted GMV 5 is routed based on 6 using a three-zone gating mechanism with thresholds 7, 8:
9
This configuration allows confident assignments to experts and a soft mixture in uncertain cases.
3. Dynamic Calibration and Online Label Correction
The calibration module addresses the underestimation bias in 0 versus 1 due to incomplete information during attribution. Specifically:
- On the repurchase branch, a calibrator MLP 2 infers the log-gap 3:
4
where 5 and 6 tracks purchases so far.
- The calibration target is the true log-gap,
7
and the network minimizes 8.
- A “pseudo-final” label is then constructed,
9
which approximates 0.
- Before routing, the composite, dynamically calibrated label is
1
integrating partial observations with calibrated corrections.
4. Training Protocol and Loss Functions
READER is optimized in two stages:
1. Offline Pretraining: On historical data 2, full 3 and 4 are available.
- Routing is done strictly using 5 (ground-truth purchase count).
- Prediction loss minimizes log-MAE:
6
- Router trained via cross-entropy; calibrator trained with log-gap regression.
2. Online Streaming Training: On fresh data (e.g., days 57–82).
- Streaming updates use observed 7.
- Prediction loss is log-MAE between 8 and the dynamically calibrated label:
9
- Upon window completion (full ground-truth available), two alignment losses are enforced:
- Ground-Truth Alignment (GRA):
0 - Partial Label Unlearning (PLU):
1
- Full objective at each cycle:
2
with 3, 4.
5. TRACE Benchmark and Empirical Analysis
TRACE, the GMV prediction benchmark, comprises 82 days of Taobao display logs, with a 7-day attribution window, yielding 7.16 million clicks and 3.84 million repurchase-labeled samples (53.6% of positive clicks). Key empirical findings include:
- GMV label drift is rapid and nonstationary: hourly GMV means fluctuate markedly, demanding online streaming training rather than offline batching. Streaming achieves AUC 0.8165 over 0.8055 offline.
- Single-purchase and repurchase samples present highly divergent label distributions in both mean and tail properties. The Kolmogorov–Smirnov test confirms 5 for distributional shift.
- The router MLP discriminates repurchase cases with AUC > 80%, indicating reliable separation potential.
- READER’s main streaming performance (“repurchase routing + dual experts + LC + GRA + PLU”) is AUC 0.8235 (+0.86%), accuracy 0.2612 (+2.19%), and ALPR 0.7523 (–6.88%) over the best single-tower baseline.
- A plausible implication is that perfect routing and instantaneous access to 6 (“oracle dual-tower”) would further improve AUC to 0.8486 and accuracy to 0.4134, suggesting nontrivial headroom for methodological advances.
6. Theoretical and Practical Implications
The combination of a lightweight repurchase router, dual-expert network, dynamic label calibrator, and streaming loss corrections enables effective modeling of continuous, multi-event delayed-feedback targets. READER’s approach directly confronts the limitations imposed by partial and evolving labels, as well as heterogeneous sequence patterns inherent to repurchase-rich commerce platforms. The empirical evidence supports the use of separate modeling pathways for single-purchase and repurchase samples, and the utility of online correction mechanisms. A plausible implication is that such architectures could generalize to other streaming, multi-event feedback tasks beyond GMV.
7. Limitations and Future Directions
Despite substantial improvements, READER’s performance indicates remaining gaps compared to the oracle scenario. Model calibration in the presence of extremely high label drift still poses challenges, and the efficacy of router predictions may be affected by nonstationarity. Extensions could include higher-order temporal modeling, adaptive routing, and hybrid loss architectures. The availability of the TRACE dataset and accompanying codebase supports reproducibility and further research in the field of delayed-feedback modeling for business metrics (Li et al., 28 Jan 2026).