- The paper introduces BIPCL, a novel framework that integrates bilateral intent modeling into both user and item embeddings.
- It employs embedding perturbation-based contrastive learning to maintain semantic consistency and improve multi-level representation uniformity.
- Empirical evaluations show over 5–20% improvements in Recall@20 and NDCG@20 across benchmarks, validating its scalability and efficiency.
Bilateral Intent-Enhanced Sequential Recommendation via Embedding Perturbation-Based Contrastive Learning
Motivation and Problem Setting
Conventional sequential recommendation (SR) models primarily focus on representing user behavior as a monolithic latent vector, often failing to fully capture the underlying multiplicity of user intents behind interaction sequences. While prior multi-intent approaches utilize dynamic routing or multi-head attention for intent disentanglement, they typically constrain intent modeling to the local signal of individual user histories and relegate collective intent semantics to auxiliary training objectives, limiting the expressivity and robustness of learned representations. Concurrently, existing contrastive learning (CL) methods for SR depend on sequence-level augmentations (masking, cropping, reordering) or graph-level perturbations, which may compromise temporal coherence or semantic faithfulness, resulting in suboptimal representation uniformity and discrimination.
BIPCL: Architecture and Methodological Advancements
The Bilateral Intent-enhanced Perturbation-based Contrastive Learning (BIPCL) framework proposes a unified, end-to-end solution for robust multi-intent SR that directly injects collective intent signals into both user sequence and item representations and overcomes the shortcomings of prevalent CL augmentation strategies.
The BIPCL model begins by constructing a global item co-occurrence graph, leveraging multi-hop propagation to obtain high-order structural item embeddings. Item representations, H(v), are further enhanced by explicit incorporation of shared, learnable intent prototypes via soft assignment and adaptive gated fusion, modeling items as aggregations over multiple latent semantic facets.
On the user side, sequences are encoded using a Transformer backbone to capture both recency-aware and long-range dependencies. The resulting sequence representations are softly aligned with intent prototypes and then fused using learned gates, producing final user embeddings h(u) that are sensitive to both individual behavioral signals and global collaborative intent semantics.
The model's bilateral integration of intent at both user and item sides enables direct, inference-time utilization of collaborative regularities rather than merely backpropagation-level alignment—addressing the information isolation problem observed in prior work.
Figure 1: BIPCL explicitly incorporates collective intent patterns shared across sequences ending at the same item, supporting intent-aware and consistent recommendations.
Contrastive Learning via Embedding Perturbation
BIPCL circumvents the drawbacks of discrete input or graph augmentations by performing contrastive augmentation directly at the level of learned item embeddings: bounded, direction-aware noise is injected, resulting in two perturbed views. These perturbed embeddings propagate through the bilateral intent modules, ensuring semantically invariant yet discriminative multi-level views—spanning both interaction and intent levels—for robust InfoNCE-based contrastive alignment.
Contrastive learning is performed not just at the instance (interaction) level but also at the intent (prototype) level, which regularizes the geometry of the latent space, improves representation uniformity, mitigates hubness, and prevents intent prototype collapse. The approach jointly optimizes recommendation and multi-level CL objectives within a single-stage, parameter-shared optimization loop.
Figure 2: BIPCL generates contrastive views by perturbing item embeddings, enabling multi-level, semantically consistent contrastive alignment, unlike conventional structural or sequence-level augmentations.
Empirical Results and Quantitative Evaluation
BIPCL achieves consistent, significant improvements over a comprehensive set of SOTA multi-intent SR baselines (including MIND, ComiRec, SimRec, DisMIR, GPR4DUR, etc.) across five public benchmarks. Relative Recall@20 and NDCG@20 gains regularly exceed 5–20% across Beauty, Yelp, Retail Rocket, Gowalla, and Amazon Books. The improvements are more pronounced for ranking metrics and persist in both sparse and dense user interaction regimes.
Ablation studies validate the contribution of each mechanism: the removal of bilateral intent enhancement, gating, or CL objectives significantly reduces accuracy, with the largest degradation observed when collaborative intent or embedding-level perturbation is omitted. Purely augmenting sequences or the item graph instead of embedding-level perturbation leads to consistent performance loss.
Figure 3: BIPCL outperforms all baselines across user groups, including those with sparse interactions, indicating robustness to user history sparsity.
The InfoNCE-based contrastive loss induces well-dispersed (uniform) representation geometry, as visualized in the density distributions of learned intent embeddings. BIPCL avoids the concentration and redundancy typical of non-contrastive baselines, resulting in more discriminative intent granularity.
Figure 4: The contrastive mechanism facilitates well-dispersed and non-redundant intent embedding distributions, as evidenced by uniform angular density profiles.
Case studies show that BIPCL aligns recommendations with both recent and non-recent user interactions, preserving semantically coherent trajectories in latent space and supporting both sudden and gradual preference shifts.
Figure 5: Projections of user interaction histories, recommendations, and ground-truth future items validate that BIPCL recommendations track both immediate and long-term user intents in latent space.
Practical Considerations: Efficiency and Scalability
With a lightweight computational profile (sparse graph operations, no iterative clustering, parameter sharing), BIPCL achieves competitive training/inference times and memory consumption relative to prior baselines, maintaining scalability to large-scale SR datasets.
Figure 6: BIPCL achieves its performance gains with overall memory consumption comparable to state-of-the-art baselines.
Hyperparameter Stability
BIPCL's accuracy is robust to the number of intent prototypes, propagation depth, perturbation magnitude, and contrastive loss weight. The method delivers stable results without requiring extensive tuning. Performance saturates after modest increases in depth or prototype number, reflecting an appropriate bias-variance trade-off.
Theoretical Implications and Future Directions
By coupling bilateral intent integration with perturbation-based multi-level contrastive regularization, BIPCL advances both the expressivity and robustness of sequential recommender systems. The framework enables direct, inference-time utilization of collaborative semantic patterns, providing a unified approach to intent-enhanced user and item modeling under sparse supervision.
Theoretically, the approach suggests a pathway for other graph-based or sequential learning problems where collaboration and disentanglement of high-entropy latent variables are critical but data is scarce. Further work could investigate richer prototype parameterizations, user- or context-adaptive augmentations, or continuous updates of the intent space for lifelong recommendation scenarios.
Conclusion
BIPCL provides an end-to-end, contrastive learning-based framework for SR that integrates collective intent semantics bilaterally into user and item embeddings and harnesses embedding-level perturbation for high-fidelity multi-level CL. The resulting gains in recommendation quality, robustness, and efficiency substantiate the effectiveness of explicit collaborative intent modeling and perturbation-invariant learning for modern recommender systems (2604.02833).