Papers
Topics
Authors
Recent
Search
2000 character limit reached

READER: Dual-Branch Repurchase Predictor

Updated 4 February 2026
  • The paper introduces a dual-branch architecture with repurchase-aware routing and dynamic calibration to accurately predict GMV in online ads.
  • It leverages online streaming training with pseudo-final labels to mitigate delayed feedback and address challenges in label drift.
  • Empirical results on the TRACE benchmark show improved AUC and accuracy, highlighting effective separation of single-purchase and repurchase cases.

Repurchase-Aware Dual-branch Predictor (READER) is a modeling paradigm designed to address the delayed feedback challenge in post-click gross merchandise volume (GMV) prediction, particularly in online advertisement ranking scenarios where a single click may result in multiple, temporally dispersed purchases. Unlike conventional conversion rate (CVR) estimation which considers binary conversion events, GMV prediction requires capturing a continuous target—summed transaction values—which complicates model training and label acquisition under delayed and multi-event feedback. READER introduces a dual-branch architecture, a repurchase-aware router, and dynamic label calibration mechanisms; these are integrated to optimize online, streaming prediction of GMV within realistic attribution windows.

1. Problem Formulation and Context

The GMV prediction problem is formulated as follows: given an ad click cc at time tct^c and embedding xRdx \in \mathbb{R}^d, the system observes a sequence of purchases

P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,

where pip_i denotes the ii-th transaction price and NN is the count of purchases in the attribution window waw_a. The ultimate target label is the total GMV per click,

y=i=1Npi,y^* = \sum_{i=1}^N p_i,

whereas at any intermediate time tt, only the partial sum

tct^c0

is available. The objective is to train a regressor tct^c1 in a streaming, online manner to predict tct^c2 as accurately as possible, updating F on every new purchase event. This setup is complicated by label drift, incomplete feedback, and the stark distributional differences between single-purchase and repurchase scenarios, as documented in the TRACE benchmark (Li et al., 28 Jan 2026).

2. Model Architecture

READER employs a shared-bottom dual-expert structure with a lightweight router for purchase type discrimination:

  • The input embedding tct^c3 is processed through a shared network:

tct^c4

yielding feature tct^c5.

  • Two specialist MLPs, tct^c6 and tct^c7, generate expert predictions for single-purchase (tct^c8) and repurchase (tct^c9) cases.
  • The router xRdx \in \mathbb{R}^d0 predicts the probability xRdx \in \mathbb{R}^d1 that a click will yield multiple purchases:

xRdx \in \mathbb{R}^d2

where xRdx \in \mathbb{R}^d3 is a temperature and xRdx \in \mathbb{R}^d4 is the sigmoid.

  • The final predicted GMV xRdx \in \mathbb{R}^d5 is routed based on xRdx \in \mathbb{R}^d6 using a three-zone gating mechanism with thresholds xRdx \in \mathbb{R}^d7, xRdx \in \mathbb{R}^d8:

xRdx \in \mathbb{R}^d9

This configuration allows confident assignments to experts and a soft mixture in uncertain cases.

3. Dynamic Calibration and Online Label Correction

The calibration module addresses the underestimation bias in P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,0 versus P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,1 due to incomplete information during attribution. Specifically:

  • On the repurchase branch, a calibrator MLP P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,2 infers the log-gap P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,3:

P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,4

where P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,5 and P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,6 tracks purchases so far.

  • The calibration target is the true log-gap,

P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,7

and the network minimizes P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,8.

  • A “pseudo-final” label is then constructed,

P={(tip,pi)}i=1N,tct1ptNptc+wa,\mathcal{P} = \{(t_i^p,\,p_i)\}_{i=1}^N, \quad t^c \leq t_1^p \leq \cdots \leq t_N^p \leq t^c + w_a,9

which approximates pip_i0.

  • Before routing, the composite, dynamically calibrated label is

pip_i1

integrating partial observations with calibrated corrections.

4. Training Protocol and Loss Functions

READER is optimized in two stages:

1. Offline Pretraining: On historical data pip_i2, full pip_i3 and pip_i4 are available.

  • Routing is done strictly using pip_i5 (ground-truth purchase count).
  • Prediction loss minimizes log-MAE:

pip_i6

  • Router trained via cross-entropy; calibrator trained with log-gap regression.

2. Online Streaming Training: On fresh data (e.g., days 57–82).

  • Streaming updates use observed pip_i7.
  • Prediction loss is log-MAE between pip_i8 and the dynamically calibrated label:

pip_i9

  • Upon window completion (full ground-truth available), two alignment losses are enforced:

    • Ground-Truth Alignment (GRA):

    ii0 - Partial Label Unlearning (PLU):

    ii1

  • Full objective at each cycle:

ii2

with ii3, ii4.

5. TRACE Benchmark and Empirical Analysis

TRACE, the GMV prediction benchmark, comprises 82 days of Taobao display logs, with a 7-day attribution window, yielding 7.16 million clicks and 3.84 million repurchase-labeled samples (53.6% of positive clicks). Key empirical findings include:

  • GMV label drift is rapid and nonstationary: hourly GMV means fluctuate markedly, demanding online streaming training rather than offline batching. Streaming achieves AUC 0.8165 over 0.8055 offline.
  • Single-purchase and repurchase samples present highly divergent label distributions in both mean and tail properties. The Kolmogorov–Smirnov test confirms ii5 for distributional shift.
  • The router MLP discriminates repurchase cases with AUC > 80%, indicating reliable separation potential.
  • READER’s main streaming performance (“repurchase routing + dual experts + LC + GRA + PLU”) is AUC 0.8235 (+0.86%), accuracy 0.2612 (+2.19%), and ALPR 0.7523 (–6.88%) over the best single-tower baseline.
  • A plausible implication is that perfect routing and instantaneous access to ii6 (“oracle dual-tower”) would further improve AUC to 0.8486 and accuracy to 0.4134, suggesting nontrivial headroom for methodological advances.

6. Theoretical and Practical Implications

The combination of a lightweight repurchase router, dual-expert network, dynamic label calibrator, and streaming loss corrections enables effective modeling of continuous, multi-event delayed-feedback targets. READER’s approach directly confronts the limitations imposed by partial and evolving labels, as well as heterogeneous sequence patterns inherent to repurchase-rich commerce platforms. The empirical evidence supports the use of separate modeling pathways for single-purchase and repurchase samples, and the utility of online correction mechanisms. A plausible implication is that such architectures could generalize to other streaming, multi-event feedback tasks beyond GMV.

7. Limitations and Future Directions

Despite substantial improvements, READER’s performance indicates remaining gaps compared to the oracle scenario. Model calibration in the presence of extremely high label drift still poses challenges, and the efficacy of router predictions may be affected by nonstationarity. Extensions could include higher-order temporal modeling, adaptive routing, and hybrid loss architectures. The availability of the TRACE dataset and accompanying codebase supports reproducibility and further research in the field of delayed-feedback modeling for business metrics (Li et al., 28 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Repurchase-Aware Dual-branch Predictor (READER).