Cascading Category Recommender (CCRec) Model

Updated 19 December 2025

The paper demonstrates that CCRec's cascading architecture consistently outperforms state-of-the-art models, especially in cold-start and strict top-K precision scenarios.
It integrates a Transformer-based negative sampler, a VAE module for user-category embeddings, and a precision-centric MLP to enhance overall prediction accuracy.
Empirical evaluations across multiple datasets show improvements of up to 105.7% for cold-start users compared to best baselines, underscoring its practical impact.

The Cascading Category Recommender (CCRec) model is a neural architecture for recommendation systems designed to address the particular demands of category-level recommendation in e-commerce environments. Unlike traditional item-level recommenders, CCRec explicitly models user interactions with item categories, thereby facilitating the discovery of broad user intentions, improving cold-start robustness, and complementing fine-grained item prediction. CCRec employs a three-stage cascaded framework encompassing a Transformer-based negative sampler, a variational autoencoder (VAE) module for user-category embeddings, and a precision-centric multilayer perceptron (MLP) scoring network. Empirical studies demonstrate that CCRec achieves consistently superior performance relative to state-of-the-art item-level and VAE-based baselines, especially on cold-start regimes and under strict top-K precision requirements (Wang et al., 17 Dec 2025).

1. Problem Formulation for Category-Level Recommendation

Let $U = \{u_1,\dots,u_n\}$ denote a set of users, $I = \{i_1,\dots,i_m\}$ the set of items, and $C = \{c_1,\dots,c_s\}$ the set of categories, with a mapping $g: I \rightarrow C$ assigning items to categories. For each user $u_t$ , the system observes:

a demographic feature vector $f_t$ ,
a sequence of past item interactions $\pi_t \subseteq I$ ,
an induced sequence of past category interactions $\delta_t = \{ g(i): i \in \pi_t \}$ (max length $k$ ),
and a future ground-truth category sequence $\gamma_t$ .

The goal is, for each user $u_t$ , to predict a ranked list $\Gamma_t \subseteq C$ of top- $N$ categories they will interact with. The core learning objective is to optimize a scoring function $s(u_t, c)$ such that

$\Gamma_t = {\text{arg top-N}}_{c \in C}\, s(u_t, c)$

Category-level recommendation deviates from standard item-level frameworks by emphasizing the need to expand user engagement across different item types and to provide reliable predictions when fine-grained behavioral signals are sparse or unavailable.

2. Model Architecture and Cascading Framework

CCRec operationalizes category recommendation through three sequential neural modules, each engineered to address different challenges of the problem:

2.1 Probability-Weighted Negative Sampler (MLE Module $M_1$ )

Input: Encoded sequence of past categories $E_1(\delta_t) \in \mathbb{R}^{k \times d_1}$ .
Encoder: Transformer with positional encoding transforms the sequence into a dense representation $T$ .
Classifier: Two fully connected layers followed by LogSoftmax over $|C|$ categories.
Training Loss: Negative log-likelihood over observed ground-truth categories,

$L_{\text{MLE}} = \sum_t \sum_{c \in \gamma_t} -\log P_{M_1}(c | \delta_t).$

Output: Ranked candidate list $r_t$ , with associated probability vector $y_t^{M_1} \in [0,1]^{|r_t|}$ .

2.2 User-Distinctive Category Encoder (VAE Module)

Input: For each candidate $(u_t, c)$ , form feature vector $X = [f_t; \pi_t^c; e(c)]$ , where $\pi_t^c$ are the user's item interactions in $c$ , $e(c)$ is the category ID embedding.
Encoder: $q_\phi(z|X)$ is a two-layer MLP computing mean $\mu(X)$ and log-variance $\log \sigma^2(X)$ .
Reparameterization: Draw latent $z = \mu + \sigma \odot \epsilon$ , $\epsilon \sim \mathcal{N}(0, I)$ .
Decoder: $p_\theta(X|z)$ is a two-layer MLP reconstructing $X$ .
Loss (per $X$ ): Evidence Lower Bound (ELBO),

$\mathcal{L}(\theta, \phi; X) = \mathbb{E}_{q_\phi(z|X)}[ \log p_\theta(X|z)] - D_{KL}(q_\phi(z|X)\|p(z))$

with $p(z) = \mathcal{N}(0, I)$ .

Output: Embedding $e_t^c = z \in \mathbb{R}^d$ representing user-specific affinity for category $c$ .

2.3 Precision-Centric Recommender (Prediction Module $M_2$ )

Input: For user $u_t$ $u_{t}$ and each $c \in r_t$ $c \in r_{t}$ :
- pretrained VAE embedding $e_t^c$
- learned category embedding $E_2(c)$
- sequence embedding $E_1(\delta_t)$
- MLE probability $y_t^{M_1}(c)$
MLP Scoring Network: Outputs $y_t^{M_2}(c) \in [0,1]$ .
Loss: Combined loss:
- Precision penalty for false positives:
$L_{\text{precision}} = \sum_{c \in r_t} \left[ \text{ReLU}(y_t^{M_1}(c) - y_t(c)) \right]^2$

where $y_t(c) = 1$ if $c \in \gamma_t$ , else $0$. - False-negative penalty (MSE):

$L_{\text{MSE}} = \sum_{c \in r_t} (y_t(c) - y_t^{M_2}(c))^2$ - Total: $L_{\text{total}} = L_{\text{precision}} + L_{\text{MSE}}$ .

3. Training, Optimization, and Inference Pipeline

Key model components and training regime include:

Architecture: All MLPs have 2 hidden layers (embedding dimension 64), VAE latent dimension $d=256$ .
Transformer: Standard multi-head self-attention with positional encoding for $M_1$ .
Hyperparameters: Learning rate $1 \times 10^{-4}$ , batch size as dataset permits, 100 epochs, Adam optimizer.
Regularization: Dropout (default $0.1$) in MLPs; no explicit KL annealing is employed.
Inference: For a given user $u_t$ $u_{t}$ ,
1. $M_1$ generates candidate list $r_t$ with scores $y_t^{M_1}$ .
2. VAE embeddings $e_t^c$ are computed for each $c$ .
3. $M_2$ scores each $(u_t, c)$ pair as $y_t^{M_2}(c)$ .
4. Categories are ranked by $y_t^{M_2}(c)$ ; top-K are recommended.

4. Experimental Validation and Comparative Results

Experiments are conducted on three datasets:

Dataset	Users	Items	Categories	Test regime
Industry	103k	3,207	151	Warm ≈10%; 222 cold users
RetailRocket	211k	91,145	1,107	Cold-start simulation
Tmall	375k	2,353,207	72

Baselines benchmarked include FMLPRec, SASRec, Mamba4Rec, CLRec, TIGER (general sequential), MeLU (cold-start), VAERec, and VAERec+ (demographics). Evaluation employs Hit Ratio@K (HR@K), Precision@K, Recall@K, and F1@K for $K \in \{1,3,5\}$ .

Key results on Industry dataset, cold-user subset:

Best baseline HR@1: 0.4729 (TIGER)
CCRec HR@1: 0.4819 (+1.9% absolute)
HR@5: best baseline 0.6081; CCRec 0.6261 (+3.0%)
Precision@3: 0.2162 vs 0.2132 (best baseline)
Overall relative improvement: ~8% on primary metrics

These improvements persist across other datasets; statistical significance is not reported but performance margins are consistent (Wang et al., 17 Dec 2025).

5. Ablation Studies and Component Analysis

Ablation variants on the industry dataset examine the individual roles of the key modules:

MLE only: $r_t$ generation and classification without further cascading.
MLE+VAE: VAE embeddings injected into $M_1$ directly.
MLE+Cascading: Full cascade except no VAE pretraining.
CCRec (full): All components.

Results (HR@1, warm users baseline 0.1929):

+VAE: 0.3197 (+65.8%)
+Cascading: 0.3377 (+75.2%)
CCRec: 0.3360 (+74.3%)

Cold-start users (HR@1, baseline 0.2342):

+VAE: 0.4324 (+84.6%)
+Cascading: 0.4594 (+96.2%)
CCRec: 0.4819 (+105.7%)

Ablations reveal that the cascading structure (precision-centric loss and reusing MLE outputs) provides the most significant gain, while VAE embeddings are especially advantageous for cold-start users. The model’s design specifically mitigates challenge of indirect supervision, missing negatives, and the need for precision when the candidate set is highly restricted; the precision-centric loss successfully penalizes false positives under these constraints (Wang et al., 17 Dec 2025).

6. Methodological Implications and Extensions

CCRec formalizes the distinction between category- and item-level recommendation—category-level signal is both sparser and semantically coarser, and CCRec leverages VAE-driven embeddings to transfer granular preference at the item level into robust, personalized category representations. The cascading design—whereby intermediate probabilities and embeddings are reused across modules—enables explicit calibration of ranking decisions, particularly for top-K objectives.

A plausible implication is that such a cascading architecture could benefit other hierarchical or grouped recommendation tasks in domains where indirect supervision and sparse positive feedback at aggregate levels create problems for direct item-level approaches. The successful use of a Transformer for category history, VAE for user-category encoding, and a dedicated loss for penalizing high-ranked false positives may inform future systems seeking high precision in top-K recommendations in e-commerce and related domains.

Markdown Report Issue Upgrade to Chat

References (1)

On Recommending Category: A Cascading Approach (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Cascading Category Recommender (CCRec) Model.