Papers
Topics
Authors
Recent
Search
2000 character limit reached

Top-K Multi-Positive Contrastive Objective

Updated 6 February 2026
  • Top-K Multi-Positive Contrastive Objective (MSCL) is a contrastive learning framework that leverages multiple positive samples to improve recommendation quality.
  • It modifies the classic InfoNCE loss with importance-aware weighting, balancing the influence of positive and negative interactions.
  • Empirical results on datasets like Yelp2018 and Amazon-Book show that MSCL achieves higher accuracy and faster convergence compared to traditional methods.

The Top-K Multi-Positive Contrastive Objective (MSCL) is a contrastive learning framework tailored for recommender systems under Top-K recommendation metrics. In this paradigm, MSCL modifies the classic InfoNCE/NT-Xent loss, incorporating a strategy that samples multiple positive items per user and applies a tunable importance weighting between positive and negative terms. This combination enhances both the utilization of sparse user-item interactions and the balance of gradient signal, resulting in improved recommendation accuracy and efficient optimization, particularly on sparse datasets (Tang et al., 2021).

1. Formalization of the Top-K Recommendation Task

The Top-K recommendation setting operates on the bipartite interaction graph G=(UI,E+)G = (U \cup I, E^+), where UU denotes the set of users, II the set of items, and E+U×IE^+ \subseteq U \times I the observed positive user-item interactions. For each user uUu \in U, the system must learn an embedding euRde_u \in \mathbb{R}^d, and similarly for each item iIi \in I an embedding eiRde_i \in \mathbb{R}^d. Recommendation is generated by ranking items for uu according to the predicted score y^ui=euTei\hat{y}_{u i} = e_u^T e_i, selecting the K items with the highest scores.

This formalism is central to embedding-based collaborative filtering and underpins both the classical contrastive baseline and the MSCL framework (Tang et al., 2021).

2. Standard Contrastive Loss (CL) and Its Limitations

The NT-Xent loss—prevalent in models such as SimCLR and GraphCL—adapts to recommendation by contrasting a user’s positive item against negatives drawn from the remainder of the minibatch. Given a minibatch DD of size NN, the positive pair (u,i+)(u, i^+) is contrasted with I={i:(u,i)E+,ibatch}I^- = \{i: (u, i) \notin E^+, i \in \text{batch}\}. The loss is:

LCL(u,i+)=logexp(f(u,i+)/τ)iIexp(f(u,i)/τ)L_{CL}(u,i^+) = -\log \frac{\exp(f(u,i^+)/\tau)}{\sum_{i\in I^-}\exp(f(u,i)/\tau)}

where f(u,i)f(u,i) is the cosine similarity of eue_u and eie_i, and τ\tau is the temperature.

The batch loss is averaged over NN such pairs. A critical limitation is the severe imbalance for each user: a single positive term against O(N)O(N) negatives, which is further exacerbated by the sparsity common in practical recommendation scenarios. Each training step exploits only one positive item, leading to under-utilization of sparse user-item interaction signals (Tang et al., 2021).

3. Derivation and Formulation of the Multi-Positive Contrastive Loss (MSCL)

3.1 Importance-aware Contrastive Loss (ICL)

ICL introduces a weighting parameter α[0,1]\alpha \in [0,1] to control the relative contribution of positive and negative terms:

LICL(u,i+)=[αf(u,i+)τ(1α)logiIexp(f(u,i)/τ)]L_{ICL}(u,i^+) = -\left[ \alpha \frac{f(u,i^+)}{\tau} - (1-\alpha) \log\sum_{i\in I^-} \exp(f(u,i)/\tau) \right]

Setting α=1/2\alpha = 1/2 recovers the symmetric NT-Xent loss, while α>1/2\alpha > 1/2 increases the emphasis on positives—helpful in extremely sparse data.

3.2 Multi-Positive Contrastive Loss (MCL) via Data Augmentation

Instead of a single positive sampled per user, MCL samples MM distinct positives {i1+,,iM+}\{i_1^+, \ldots, i_M^+\} from the user’s interaction history (without replacement). The per-user loss is aggregated over all sampled positives:

LMCL=m=1MLCL(u,im+)L_{MCL} = \sum_{m=1}^M L_{CL}(u, i_m^+)

3.3 Final Multi-Sample Contrastive Loss (MSCL)

Combining importance weighting and multi-positive sampling, MSCL is formally written as:

LMSCL=m=1MLICL(u,im+)L_{MSCL} = \sum_{m=1}^M L_{ICL}(u, i_m^+)

with the overall minibatch loss:

LMSCL=1N(u,{im+})DMm=1M[αf(u,im+)τ(1α)logiIexp(f(u,i)/τ)]L_{MSCL} = -\frac{1}{N}\sum_{(u, \{i^+_m\}) \in D_M} \sum_{m=1}^M \left[ \alpha \frac{f(u,i^+_m)}{\tau} - (1-\alpha)\log\sum_{i\in I^-} \exp(f(u,i)/\tau) \right]

This approach enables simultaneous utilization of multiple positives, and the weighting scheme directly modulates the gradient flow to address imbalance (Tang et al., 2021).

4. Multi-Positive Sampling and Data Augmentation

At each iteration, for each user uu in the minibatch, let PuP_u denote all known positives. MSCL samples MM positives without replacement, resulting in C(Pu,M)C(|P_u|, M) potential positive sets and thus significantly augments the sampling space. Each sampled im+i_m^+ receives a parallel contrastive loss with the same set of negatives, substantially increasing supervisory signal—even in the case of very short user histories. This combinatorial data augmentation effect is a defining characteristic of MSCL, enabling the model to effectively utilize limited positive interactions (Tang et al., 2021).

5. Hyperparameterization: Balancing Positives and Negatives

The hyperparameter α[0,1]\alpha \in [0,1] is the central knob for controlling the positive/negative trade-off. α=0.5\alpha = 0.5 yields the conventional contrastive loss, but tuning is dataset-dependent: α0.45\alpha \simeq 0.45 works best for denser sets, and α0.6\alpha \simeq 0.6 for ultra-sparse domains (e.g., Alibaba-iFashion). Selection of MM, the number of positive views, is typically in the range 5M75\leq M \leq 7, further increasing as user positive history and dataset size permit (Tang et al., 2021).

6. Integration with Graph Encoder Architectures

MSCL is agnostic to the embedding-based encoder and is demonstrated with LightGCN and sLightGCN. User and item embeddings eue_u and eie_i are computed as the layer-wise averages over GCN propagation steps:

eu=1K+1k=0Keu(k)e_u = \frac{1}{K+1}\sum_{k=0}^K e_u^{(k)}

The cosine similarity between eue_u and eie_i, f(u,i)=cos(eu,ei)f(u, i) = \cos(e_u, e_i), is input to the MSCL objective, replacing prior ranking-based losses such as BPR (Tang et al., 2021).

7. Empirical Findings and Advantages in Sparse Regimes

Benchmarking on Yelp2018, Amazon-Book, and Alibaba-iFashion datasets (interaction densities 1.3×1031.3 \times 10^{-3}, 6.2×1046.2 \times 10^{-4}, 7×1057 \times 10^{-5}, respectively), MSCL with M7M \approx 7 and tuned α\alpha significantly outperforms the single-positive CL baseline:

Dataset Metric CL MSCL Relative Improvement
Yelp2018 recall@20 0.0655 0.0691 +5.0% (NDCG@20)
Amazon-Book recall@20 0.0480 0.0580 +17% (NDCG@20)
Alibaba-iFashion recall@20 0.1152 0.1201 +4% (NDCG@20)

These gains are accompanied by marked improvements in convergence speed (\sim50 epochs for MSCL versus \sim900 for BPR) and nearly identical per-epoch computational cost on modern GPUs (Tang et al., 2021).

In summary, MSCL operationalizes combinatorially-augmented contrastive learning, tuned by explicit positive:negative weighting. Its principal benefit lies in overcoming the dual challenges of imbalance and sparse data endemic to practical Top-K recommender systems (Tang et al., 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Top-K Multi-Positive Contrastive Objective.