Cascading Category Recommender (CCRec) Model
- The paper demonstrates that CCRec's cascading architecture consistently outperforms state-of-the-art models, especially in cold-start and strict top-K precision scenarios.
- It integrates a Transformer-based negative sampler, a VAE module for user-category embeddings, and a precision-centric MLP to enhance overall prediction accuracy.
- Empirical evaluations across multiple datasets show improvements of up to 105.7% for cold-start users compared to best baselines, underscoring its practical impact.
The Cascading Category Recommender (CCRec) model is a neural architecture for recommendation systems designed to address the particular demands of category-level recommendation in e-commerce environments. Unlike traditional item-level recommenders, CCRec explicitly models user interactions with item categories, thereby facilitating the discovery of broad user intentions, improving cold-start robustness, and complementing fine-grained item prediction. CCRec employs a three-stage cascaded framework encompassing a Transformer-based negative sampler, a variational autoencoder (VAE) module for user-category embeddings, and a precision-centric multilayer perceptron (MLP) scoring network. Empirical studies demonstrate that CCRec achieves consistently superior performance relative to state-of-the-art item-level and VAE-based baselines, especially on cold-start regimes and under strict top-K precision requirements (Wang et al., 17 Dec 2025).
1. Problem Formulation for Category-Level Recommendation
Let denote a set of users, the set of items, and the set of categories, with a mapping assigning items to categories. For each user , the system observes:
- a demographic feature vector ,
- a sequence of past item interactions ,
- an induced sequence of past category interactions (max length ),
- and a future ground-truth category sequence .
The goal is, for each user , to predict a ranked list of top- categories they will interact with. The core learning objective is to optimize a scoring function such that
Category-level recommendation deviates from standard item-level frameworks by emphasizing the need to expand user engagement across different item types and to provide reliable predictions when fine-grained behavioral signals are sparse or unavailable.
2. Model Architecture and Cascading Framework
CCRec operationalizes category recommendation through three sequential neural modules, each engineered to address different challenges of the problem:
2.1 Probability-Weighted Negative Sampler (MLE Module )
- Input: Encoded sequence of past categories .
- Encoder: Transformer with positional encoding transforms the sequence into a dense representation .
- Classifier: Two fully connected layers followed by LogSoftmax over categories.
- Training Loss: Negative log-likelihood over observed ground-truth categories,
- Output: Ranked candidate list , with associated probability vector .
2.2 User-Distinctive Category Encoder (VAE Module)
- Input: For each candidate , form feature vector , where are the user's item interactions in , is the category ID embedding.
- Encoder: is a two-layer MLP computing mean and log-variance .
- Reparameterization: Draw latent , .
- Decoder: is a two-layer MLP reconstructing .
- Loss (per ): Evidence Lower Bound (ELBO),
with .
- Output: Embedding representing user-specific affinity for category .
2.3 Precision-Centric Recommender (Prediction Module )
- Input: For user and each :
- pretrained VAE embedding
- learned category embedding
- sequence embedding
- MLE probability
- MLP Scoring Network: Outputs .
- Loss: Combined loss:
- Precision penalty for false positives:
where if , else $0$. - False-negative penalty (MSE):
- Total: .
3. Training, Optimization, and Inference Pipeline
Key model components and training regime include:
- Architecture: All MLPs have 2 hidden layers (embedding dimension 64), VAE latent dimension .
- Transformer: Standard multi-head self-attention with positional encoding for .
- Hyperparameters: Learning rate , batch size as dataset permits, 100 epochs, Adam optimizer.
- Regularization: Dropout (default $0.1$) in MLPs; no explicit KL annealing is employed.
- Inference: For a given user ,
- generates candidate list with scores .
- VAE embeddings are computed for each .
- scores each pair as .
- Categories are ranked by ; top-K are recommended.
4. Experimental Validation and Comparative Results
Experiments are conducted on three datasets:
| Dataset | Users | Items | Categories | Test regime |
|---|---|---|---|---|
| Industry | 103k | 3,207 | 151 | Warm ≈10%; 222 cold users |
| RetailRocket | 211k | 91,145 | 1,107 | Cold-start simulation |
| Tmall | 375k | 2,353,207 | 72 |
Baselines benchmarked include FMLPRec, SASRec, Mamba4Rec, CLRec, TIGER (general sequential), MeLU (cold-start), VAERec, and VAERec+ (demographics). Evaluation employs Hit Ratio@K (HR@K), Precision@K, Recall@K, and F1@K for .
Key results on Industry dataset, cold-user subset:
Best baseline HR@1: 0.4729 (TIGER)
- CCRec HR@1: 0.4819 (+1.9% absolute)
- HR@5: best baseline 0.6081; CCRec 0.6261 (+3.0%)
- Precision@3: 0.2162 vs 0.2132 (best baseline)
- Overall relative improvement: ~8% on primary metrics
These improvements persist across other datasets; statistical significance is not reported but performance margins are consistent (Wang et al., 17 Dec 2025).
5. Ablation Studies and Component Analysis
Ablation variants on the industry dataset examine the individual roles of the key modules:
- MLE only: generation and classification without further cascading.
- MLE+VAE: VAE embeddings injected into directly.
- MLE+Cascading: Full cascade except no VAE pretraining.
- CCRec (full): All components.
Results (HR@1, warm users baseline 0.1929):
- +VAE: 0.3197 (+65.8%)
- +Cascading: 0.3377 (+75.2%)
- CCRec: 0.3360 (+74.3%)
Cold-start users (HR@1, baseline 0.2342):
- +VAE: 0.4324 (+84.6%)
- +Cascading: 0.4594 (+96.2%)
- CCRec: 0.4819 (+105.7%)
Ablations reveal that the cascading structure (precision-centric loss and reusing MLE outputs) provides the most significant gain, while VAE embeddings are especially advantageous for cold-start users. The model’s design specifically mitigates challenge of indirect supervision, missing negatives, and the need for precision when the candidate set is highly restricted; the precision-centric loss successfully penalizes false positives under these constraints (Wang et al., 17 Dec 2025).
6. Methodological Implications and Extensions
CCRec formalizes the distinction between category- and item-level recommendation—category-level signal is both sparser and semantically coarser, and CCRec leverages VAE-driven embeddings to transfer granular preference at the item level into robust, personalized category representations. The cascading design—whereby intermediate probabilities and embeddings are reused across modules—enables explicit calibration of ranking decisions, particularly for top-K objectives.
A plausible implication is that such a cascading architecture could benefit other hierarchical or grouped recommendation tasks in domains where indirect supervision and sparse positive feedback at aggregate levels create problems for direct item-level approaches. The successful use of a Transformer for category history, VAE for user-category encoding, and a dedicated loss for penalizing high-ranked false positives may inform future systems seeking high precision in top-K recommendations in e-commerce and related domains.