Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cascading Category Recommender (CCRec) Model

Updated 19 December 2025
  • The paper demonstrates that CCRec's cascading architecture consistently outperforms state-of-the-art models, especially in cold-start and strict top-K precision scenarios.
  • It integrates a Transformer-based negative sampler, a VAE module for user-category embeddings, and a precision-centric MLP to enhance overall prediction accuracy.
  • Empirical evaluations across multiple datasets show improvements of up to 105.7% for cold-start users compared to best baselines, underscoring its practical impact.

The Cascading Category Recommender (CCRec) model is a neural architecture for recommendation systems designed to address the particular demands of category-level recommendation in e-commerce environments. Unlike traditional item-level recommenders, CCRec explicitly models user interactions with item categories, thereby facilitating the discovery of broad user intentions, improving cold-start robustness, and complementing fine-grained item prediction. CCRec employs a three-stage cascaded framework encompassing a Transformer-based negative sampler, a variational autoencoder (VAE) module for user-category embeddings, and a precision-centric multilayer perceptron (MLP) scoring network. Empirical studies demonstrate that CCRec achieves consistently superior performance relative to state-of-the-art item-level and VAE-based baselines, especially on cold-start regimes and under strict top-K precision requirements (Wang et al., 17 Dec 2025).

1. Problem Formulation for Category-Level Recommendation

Let U={u1,,un}U = \{u_1,\dots,u_n\} denote a set of users, I={i1,,im}I = \{i_1,\dots,i_m\} the set of items, and C={c1,,cs}C = \{c_1,\dots,c_s\} the set of categories, with a mapping g:ICg: I \rightarrow C assigning items to categories. For each user utu_t, the system observes:

  • a demographic feature vector ftf_t,
  • a sequence of past item interactions πtI\pi_t \subseteq I,
  • an induced sequence of past category interactions δt={g(i):iπt}\delta_t = \{ g(i): i \in \pi_t \} (max length kk),
  • and a future ground-truth category sequence γt\gamma_t.

The goal is, for each user utu_t, to predict a ranked list ΓtC\Gamma_t \subseteq C of top-NN categories they will interact with. The core learning objective is to optimize a scoring function s(ut,c)s(u_t, c) such that

Γt=arg top-NcCs(ut,c)\Gamma_t = {\text{arg top-N}}_{c \in C}\, s(u_t, c)

Category-level recommendation deviates from standard item-level frameworks by emphasizing the need to expand user engagement across different item types and to provide reliable predictions when fine-grained behavioral signals are sparse or unavailable.

2. Model Architecture and Cascading Framework

CCRec operationalizes category recommendation through three sequential neural modules, each engineered to address different challenges of the problem:

2.1 Probability-Weighted Negative Sampler (MLE Module M1M_1)

  • Input: Encoded sequence of past categories E1(δt)Rk×d1E_1(\delta_t) \in \mathbb{R}^{k \times d_1}.
  • Encoder: Transformer with positional encoding transforms the sequence into a dense representation TT.
  • Classifier: Two fully connected layers followed by LogSoftmax over C|C| categories.
  • Training Loss: Negative log-likelihood over observed ground-truth categories,

LMLE=tcγtlogPM1(cδt).L_{\text{MLE}} = \sum_t \sum_{c \in \gamma_t} -\log P_{M_1}(c | \delta_t).

  • Output: Ranked candidate list rtr_t, with associated probability vector ytM1[0,1]rty_t^{M_1} \in [0,1]^{|r_t|}.

2.2 User-Distinctive Category Encoder (VAE Module)

  • Input: For each candidate (ut,c)(u_t, c), form feature vector X=[ft;πtc;e(c)]X = [f_t; \pi_t^c; e(c)], where πtc\pi_t^c are the user's item interactions in cc, e(c)e(c) is the category ID embedding.
  • Encoder: qϕ(zX)q_\phi(z|X) is a two-layer MLP computing mean μ(X)\mu(X) and log-variance logσ2(X)\log \sigma^2(X).
  • Reparameterization: Draw latent z=μ+σϵz = \mu + \sigma \odot \epsilon, ϵN(0,I)\epsilon \sim \mathcal{N}(0, I).
  • Decoder: pθ(Xz)p_\theta(X|z) is a two-layer MLP reconstructing XX.
  • Loss (per XX): Evidence Lower Bound (ELBO),

L(θ,ϕ;X)=Eqϕ(zX)[logpθ(Xz)]DKL(qϕ(zX)p(z))\mathcal{L}(\theta, \phi; X) = \mathbb{E}_{q_\phi(z|X)}[ \log p_\theta(X|z)] - D_{KL}(q_\phi(z|X)\|p(z))

with p(z)=N(0,I)p(z) = \mathcal{N}(0, I).

  • Output: Embedding etc=zRde_t^c = z \in \mathbb{R}^d representing user-specific affinity for category cc.

2.3 Precision-Centric Recommender (Prediction Module M2M_2)

  • Input: For user utu_t and each crtc \in r_t:
    • pretrained VAE embedding etce_t^c
    • learned category embedding E2(c)E_2(c)
    • sequence embedding E1(δt)E_1(\delta_t)
    • MLE probability ytM1(c)y_t^{M_1}(c)
  • MLP Scoring Network: Outputs ytM2(c)[0,1]y_t^{M_2}(c) \in [0,1].
  • Loss: Combined loss:

    • Precision penalty for false positives:

    Lprecision=crt[ReLU(ytM1(c)yt(c))]2L_{\text{precision}} = \sum_{c \in r_t} \left[ \text{ReLU}(y_t^{M_1}(c) - y_t(c)) \right]^2

    where yt(c)=1y_t(c) = 1 if cγtc \in \gamma_t, else $0$. - False-negative penalty (MSE):

    LMSE=crt(yt(c)ytM2(c))2L_{\text{MSE}} = \sum_{c \in r_t} (y_t(c) - y_t^{M_2}(c))^2 - Total: Ltotal=Lprecision+LMSEL_{\text{total}} = L_{\text{precision}} + L_{\text{MSE}}.

3. Training, Optimization, and Inference Pipeline

Key model components and training regime include:

  • Architecture: All MLPs have 2 hidden layers (embedding dimension 64), VAE latent dimension d=256d=256.
  • Transformer: Standard multi-head self-attention with positional encoding for M1M_1.
  • Hyperparameters: Learning rate 1×1041 \times 10^{-4}, batch size as dataset permits, 100 epochs, Adam optimizer.
  • Regularization: Dropout (default $0.1$) in MLPs; no explicit KL annealing is employed.
  • Inference: For a given user utu_t,

    1. M1M_1 generates candidate list rtr_t with scores ytM1y_t^{M_1}.
    2. VAE embeddings etce_t^c are computed for each cc.
    3. M2M_2 scores each (ut,c)(u_t, c) pair as ytM2(c)y_t^{M_2}(c).
    4. Categories are ranked by ytM2(c)y_t^{M_2}(c); top-K are recommended.

4. Experimental Validation and Comparative Results

Experiments are conducted on three datasets:

Dataset Users Items Categories Test regime
Industry 103k 3,207 151 Warm ≈10%; 222 cold users
RetailRocket 211k 91,145 1,107 Cold-start simulation
Tmall 375k 2,353,207 72

Baselines benchmarked include FMLPRec, SASRec, Mamba4Rec, CLRec, TIGER (general sequential), MeLU (cold-start), VAERec, and VAERec+ (demographics). Evaluation employs Hit Ratio@K (HR@K), Precision@K, Recall@K, and F1@K for K{1,3,5}K \in \{1,3,5\}.

Key results on Industry dataset, cold-user subset:

  • Best baseline HR@1: 0.4729 (TIGER)

  • CCRec HR@1: 0.4819 (+1.9% absolute)
  • HR@5: best baseline 0.6081; CCRec 0.6261 (+3.0%)
  • Precision@3: 0.2162 vs 0.2132 (best baseline)
  • Overall relative improvement: ~8% on primary metrics

These improvements persist across other datasets; statistical significance is not reported but performance margins are consistent (Wang et al., 17 Dec 2025).

5. Ablation Studies and Component Analysis

Ablation variants on the industry dataset examine the individual roles of the key modules:

  • MLE only: rtr_t generation and classification without further cascading.
  • MLE+VAE: VAE embeddings injected into M1M_1 directly.
  • MLE+Cascading: Full cascade except no VAE pretraining.
  • CCRec (full): All components.

Results (HR@1, warm users baseline 0.1929):

  • +VAE: 0.3197 (+65.8%)
  • +Cascading: 0.3377 (+75.2%)
  • CCRec: 0.3360 (+74.3%)

Cold-start users (HR@1, baseline 0.2342):

  • +VAE: 0.4324 (+84.6%)
  • +Cascading: 0.4594 (+96.2%)
  • CCRec: 0.4819 (+105.7%)

Ablations reveal that the cascading structure (precision-centric loss and reusing MLE outputs) provides the most significant gain, while VAE embeddings are especially advantageous for cold-start users. The model’s design specifically mitigates challenge of indirect supervision, missing negatives, and the need for precision when the candidate set is highly restricted; the precision-centric loss successfully penalizes false positives under these constraints (Wang et al., 17 Dec 2025).

6. Methodological Implications and Extensions

CCRec formalizes the distinction between category- and item-level recommendation—category-level signal is both sparser and semantically coarser, and CCRec leverages VAE-driven embeddings to transfer granular preference at the item level into robust, personalized category representations. The cascading design—whereby intermediate probabilities and embeddings are reused across modules—enables explicit calibration of ranking decisions, particularly for top-K objectives.

A plausible implication is that such a cascading architecture could benefit other hierarchical or grouped recommendation tasks in domains where indirect supervision and sparse positive feedback at aggregate levels create problems for direct item-level approaches. The successful use of a Transformer for category history, VAE for user-category encoding, and a dedicated loss for penalizing high-ranked false positives may inform future systems seeking high precision in top-K recommendations in e-commerce and related domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cascading Category Recommender (CCRec) Model.