Papers
Topics
Authors
Recent
Search
2000 character limit reached

STCRank: Spatio-Temporal Collaborative Ranking

Updated 22 January 2026
  • STCRank is a framework that models interrelated spatial (within-slot) and temporal (across-slot) objectives for immersive e-commerce recommendations.
  • Its Multi-Objective Collaboration module tailors labeling and weighting strategies to balance view-through, swipe-down, and conversion signals for optimal performance.
  • The Multi-Slot Collaboration module uses look-ahead scoring and beam search to optimize sequential item ordering, enhancing session depth and purchase rates.

Spatio-temporal Collaborative Ranking (STCRank) is an advanced interactive recommender system framework designed to address the unique challenges of immersive, full-screen, swipe-based e-commerce platforms, specifically within the Kuaishou E-shop deployment context. Unlike conventional ranking systems, STCRank explicitly models and collaborates across interrelated ranking objectives both within each recommendation slot (“spatial”) and across sequential slots in a session (“temporal”). Its architecture and algorithmic innovations comprise a dual-module system: Multi-Objective Collaboration (MOC) for objective-level spatial optimization and Multi-Slot Collaboration (MSC) for temporal, sequence-level optimization (Xia et al., 15 Jan 2026).

1. Formal Problem Definition

STCRank operates in sessions split into two stages: an Explore stage (E-stage), where users interact with a homepage feed, and a Focus stage (F-stage) triggered by a user click, presenting sequential full-screen slots. In each slot ii, a candidate set C\mathcal{C} of nn items is retrieved. The system must select and order a much smaller subset mnm \ll n for display.

Three binary feedback objectives are jointly optimized per item exposure:

  • Conversion (cvr): user ultimately taps “buy.”
  • View-through (vtr): user dwells on card at least TT seconds.
  • Swipe-down (sdr): user swipes down to reveal the next slot.

Notational conventions:

  • yj(c){0,1}y^j(c)\in\{0,1\}: ground-truth label for candidate cc and objective j{vtr,sdr,cvr}j\in\{\mathrm{vtr,sdr,cvr}\}
  • y^j(c)[0,1]\hat y^j(c)\in[0,1]: predicted probability
  • wjw_j: ranking weight

A Multi-gate Mixture-of-Experts (MMOE) model is trained by summing binary cross-entropy losses across objectives:

L(Θ)=cCj{vtr,sdr,cvr}[yj(c)logy^j(c)+(1yj(c))log(1y^j(c))].\mathcal{L}(\Theta) = -\sum_{c\in \mathcal{C}}\sum_{j\in\{\mathrm{vtr,sdr,cvr}\}} \left[y^{j}(c)\log\hat y^{j}(c)+(1-y^{j}(c))\log(1-\hat y^{j}(c))\right].

At inference, the per-item ranking score collapses as:

v(c)=wvtry^vtr(c)+wsdry^sdr(c)+wcvry^cvr(c).v(c) = w_{\rm vtr}\,\hat y^{\rm vtr}(c) + w_{\rm sdr}\,\hat y^{\rm sdr}(c) + w_{\rm cvr}\,\hat y^{\rm cvr}(c).

The key insight is that these objectives interact not only within but also across slots, requiring both Pareto-optimal tradeoff handling (spatial) and sequence-aware ranking (temporal).

2. Multi-Objective Collaboration (MOC) Module

MOC addresses the intrinsic overlap and conflict among vtr, sdr, and cvr observed in immersive e-commerce browsing:

  • Strong overlap between vtr and cvr due to long views often anticipating conversion.
  • Strong conflict between sdr and cvr since swipe-down behavior may be negatively correlated with conversion.

To approach the Pareto frontier, MOC employs tailored label-design and sample-weighting strategies:

  • vtr labels: Dwell time TT is binarized as yvtr(c)=1{T(c)>T0}y^{\rm vtr}(c)=\mathbf{1}\{T(c)>T_0\}. T0=5T_0=5 s is empirically selected to balance overlap-breaking and meaningful interest capture: T0=2T_0=2 s causes too much redundancy (IPV +8.99%, Purch –0.49%), T0=25T_0=25 s is too strict (no purchase gain).
  • sdr labels: Only the first swipe per session is counted positive; exits with purchase are excluded from negatives, reducing ambiguity and mitigating negative impact on cvr.
  • cvr labels: Standard post-exposure “buy” indicator.

Weights {wvtr,wsdr,wcvr}\{w_{\rm vtr},w_{\rm sdr},w_{\rm cvr}\} are hyperparameters tuned by maximizing the sum of per-objective AUCs, constrained to be non-negative and sum to 1, via Bayesian optimization.

MOC ensures that no objective is excessively suppressed, enabling growth in both DAU (driven by vtr/sdr) and purchases (cvr). Ablations confirm the necessity of MOC refinements: naive sdr labeling results in +0.51% IPV but –3.22% Purch, while conflict-aware labels achieve +3.6% IPV and +0.42% Purch.

3. Multi-Slot Collaboration (MSC) Module

MSC mitigates the “temporal greedy trap” in sequential slot scenarios inherent to immersive, full-screen UIs: maximizing immediate expected conversion in every slot incentivizes “sure-bet” items, which may cut short session length and overall utility.

MSC implements dual-stage look-ahead ranking:

  • Cross-Stage Look-Ahead (E→F): E-stage items are re-scored based on their predicted potential for inducing high-involvement F-stage conversions. The look-ahead value is:

VF(c)=p^ctr(c)×p^sdr(c)×y^Fcvr(c)V_F(c) = \hat p_{\rm ctr}(c) \times \hat p_{\rm sdr*}(c) \times \hat y_F^{\rm cvr*}(c)

The E-stage score is augmented as scoreE(c)=y^Ecvr(c)+λVF(c)\mathrm{score}_E(c) = \hat y_E^{\rm cvr}(c) + \lambda V_F(c).

  • Single-Stage Look-Ahead (F-stage): For a length-mm slot sequence S=(c1,...,cm)S=(c_1, ..., c_m),

V(S)=i=1m[k=1i1y^sdr(ck)] v(ci)\mathcal{V}(S) = \sum_{i=1}^m \left[\prod_{k=1}^{i-1}\hat y^{\rm sdr}(c_k)\right]\ v(c_i)

where each term is weighted by the survival probability (swiping through prior slots).

Beam-search is employed (beam-size B=25B=25) to find the best-permuted sequence maximizing expected overall session utility, circumventing intractable DP over AnmA_n^m permutations.

Ablation studies confirm the utility of permutation-level optimization: while Hit@all (set similarity) between pointwise and beam-search top-10 is ≈90%, Hit@1 (ordering optimality) is only ≈31%—the sequence order matters significantly for maximizing cumulative utility.

4. Model Architecture and Training Protocol

STCRank's neural architecture features:

  • Embeddings: Unified embedding layer for user features (demographics, long-term history), trigger-item, and candidate-item features.
  • Expert towers: MMOE configuration with multiple MLP “experts” gated into three task-specific towers for vtr, sdr, and cvr predictions.
  • Auxiliary heads: Additional head to predict cross-stage F-stage conversion potential on E-stage exposures.

The end-to-end system is trained with the Adam optimizer (learning rate 1e-4, batch size 2048), using minute-level AUC for early stopping. After convergence, the linear weights over objectives are optimized via Bayesian search. Key hyperparameters include retrieve pool size n=800n=800, final return m=10m=10, and vtr threshold T0=5T_0=5 s.

5. System Deployment and Empirical Performance

All ranking and re-ranking are implemented in C++ in the serving cluster, with p90 latency overhead of approximately 5 ms for F-stage ranking (total budget 50 ms), and aggressive score-based pruning to meet a 100 ms system-level SLO. The auxiliary cross-stage prediction head is pre-warmed, requiring no extra runtime fetch.

A/B experimentation uses a 7-day fixed window, measuring key business and engagement metrics:

  • Item-detail Page View (IPV): F-stage = dwell >2 s, E-stage = click.
  • Purchase: “Buy” action.
  • DAU: Daily active users entering F-stage.

Incremental performance over previous baselines for F-stage (relative increases):

Release IPV Purch DAU
MOC-base +6.96% +0.53% +0.27%
MOC-1 (vtr=5s) +4.87% +1.90% +0.15%
MOC-2 (+sdr fixes) +3.60% +0.42% +0.12%
MOC-3 (Pareto-tuned) +1.21% +4.18% +0.12%
MSC-1 (rank seq) +1.94% +2.10% +0.41%
MSC-2 (cross-stage) +0.60% +2.66% +0.65%

Over the MOC-base, the joint STCRank (MOC+MSC) on E+F yields IPV +9.65%, Purchase +1.55%, DAU +0.03%.

Subgroup analyses reveal especially large purchase uplifts in high-involvement categories (e.g., women’s shoes +42% of gain, women’s clothing +25% of gain). Optimizing for long-term utility also increases session depth, e.g., all categories +7.2% deeper swipe sessions, women’s shoes +32.6%.

6. Limitations, Ablations, and Research Directions

STCRank, while effective, currently relies on fixed additive weights wjw_j; potential advances include learned contextual gating over objectives. The beam-search sequence re-ranking is heuristic; global optimality might be approached with RL or full DP solvers. The framework is limited to three primary objectives, but could be extended to incorporate margin-based or risk-aware metrics via analogous MOC formulations.

Future research could integrate end-to-end learning for the sequence evaluator, e.g., via listwise losses or straight-through gradient estimators, and investigate more expressive ensembling or sample-weighting strategies.

Ablation studies further elucidate parameter effects:

  • vtr threshold: T0=2T_0=2 s over-represents superficial interest; T0=25T_0=25 s under-represents intent, with T0=5T_0=5 s striking the optimal balance.
  • sdr labeling: naïve or simplistic schemes degrade purchase; excluding conflict negatives and focusing on first-swipe only is crucial.
  • Permutation-level sequence optimization, not just candidate selection, is a main driver of cumulative utility.

7. Context and Significance

STCRank represents an operational shift toward spatio-temporal, multi-objective-aware ranking for interactive recommender systems in immersive e-commerce. Its deployment at Kuaishou E-shop since June 2025 demonstrates consistent gains in both user engagement and purchase through explicit collaboration among objectives and slots (Xia et al., 15 Jan 2026). The framework provides both algorithmic and empirical evidence that comprehensive multi-objective and multi-slot modeling is necessary for optimal performance under modern, full-screen, swipe-based UIs. Its design and ablation analysis offer a reference for future large-scale interactive recommendation deployments demanding joint modeling of user engagement and commercial conversion signals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Spatio-temporal Collaborative Ranking (STCRank).