Top-K Multi-Positive Contrastive Objective

Updated 6 February 2026

Top-K Multi-Positive Contrastive Objective (MSCL) is a contrastive learning framework that leverages multiple positive samples to improve recommendation quality.
It modifies the classic InfoNCE loss with importance-aware weighting, balancing the influence of positive and negative interactions.
Empirical results on datasets like Yelp2018 and Amazon-Book show that MSCL achieves higher accuracy and faster convergence compared to traditional methods.

The Top-K Multi-Positive Contrastive Objective (MSCL) is a contrastive learning framework tailored for recommender systems under Top-K recommendation metrics. In this paradigm, MSCL modifies the classic InfoNCE/NT-Xent loss, incorporating a strategy that samples multiple positive items per user and applies a tunable importance weighting between positive and negative terms. This combination enhances both the utilization of sparse user-item interactions and the balance of gradient signal, resulting in improved recommendation accuracy and efficient optimization, particularly on sparse datasets (Tang et al., 2021).

1. Formalization of the Top-K Recommendation Task

The Top-K recommendation setting operates on the bipartite interaction graph $G = (U \cup I, E^+)$ , where $U$ denotes the set of users, $I$ the set of items, and $E^+ \subseteq U \times I$ the observed positive user-item interactions. For each user $u \in U$ , the system must learn an embedding $e_u \in \mathbb{R}^d$ , and similarly for each item $i \in I$ an embedding $e_i \in \mathbb{R}^d$ . Recommendation is generated by ranking items for $u$ according to the predicted score $\hat{y}_{u i} = e_u^T e_i$ , selecting the K items with the highest scores.

This formalism is central to embedding-based collaborative filtering and underpins both the classical contrastive baseline and the MSCL framework (Tang et al., 2021).

2. Standard Contrastive Loss (CL) and Its Limitations

The NT-Xent loss—prevalent in models such as SimCLR and GraphCL—adapts to recommendation by contrasting a user’s positive item against negatives drawn from the remainder of the minibatch. Given a minibatch $D$ of size $N$ , the positive pair $(u, i^+)$ is contrasted with $I^- = \{i: (u, i) \notin E^+, i \in \text{batch}\}$ . The loss is:

$L_{CL}(u,i^+) = -\log \frac{\exp(f(u,i^+)/\tau)}{\sum_{i\in I^-}\exp(f(u,i)/\tau)}$

where $f(u,i)$ is the cosine similarity of $e_u$ and $e_i$ , and $\tau$ is the temperature.

The batch loss is averaged over $N$ such pairs. A critical limitation is the severe imbalance for each user: a single positive term against $O(N)$ negatives, which is further exacerbated by the sparsity common in practical recommendation scenarios. Each training step exploits only one positive item, leading to under-utilization of sparse user-item interaction signals (Tang et al., 2021).

3. Derivation and Formulation of the Multi-Positive Contrastive Loss (MSCL)

3.1 Importance-aware Contrastive Loss (ICL)

ICL introduces a weighting parameter $\alpha \in [0,1]$ to control the relative contribution of positive and negative terms:

$L_{ICL}(u,i^+) = -\left[ \alpha \frac{f(u,i^+)}{\tau} - (1-\alpha) \log\sum_{i\in I^-} \exp(f(u,i)/\tau) \right]$

Setting $\alpha = 1/2$ recovers the symmetric NT-Xent loss, while $\alpha > 1/2$ increases the emphasis on positives—helpful in extremely sparse data.

3.2 Multi-Positive Contrastive Loss (MCL) via Data Augmentation

Instead of a single positive sampled per user, MCL samples $M$ distinct positives $\{i_1^+, \ldots, i_M^+\}$ from the user’s interaction history (without replacement). The per-user loss is aggregated over all sampled positives:

$L_{MCL} = \sum_{m=1}^M L_{CL}(u, i_m^+)$

3.3 Final Multi-Sample Contrastive Loss (MSCL)

Combining importance weighting and multi-positive sampling, MSCL is formally written as:

$L_{MSCL} = \sum_{m=1}^M L_{ICL}(u, i_m^+)$

with the overall minibatch loss:

$L_{MSCL} = -\frac{1}{N}\sum_{(u, \{i^+_m\}) \in D_M} \sum_{m=1}^M \left[ \alpha \frac{f(u,i^+_m)}{\tau} - (1-\alpha)\log\sum_{i\in I^-} \exp(f(u,i)/\tau) \right]$

This approach enables simultaneous utilization of multiple positives, and the weighting scheme directly modulates the gradient flow to address imbalance (Tang et al., 2021).

4. Multi-Positive Sampling and Data Augmentation

At each iteration, for each user $u$ in the minibatch, let $P_u$ denote all known positives. MSCL samples $M$ positives without replacement, resulting in $C(|P_u|, M)$ potential positive sets and thus significantly augments the sampling space. Each sampled $i_m^+$ receives a parallel contrastive loss with the same set of negatives, substantially increasing supervisory signal—even in the case of very short user histories. This combinatorial data augmentation effect is a defining characteristic of MSCL, enabling the model to effectively utilize limited positive interactions (Tang et al., 2021).

5. Hyperparameterization: Balancing Positives and Negatives

The hyperparameter $\alpha \in [0,1]$ is the central knob for controlling the positive/negative trade-off. $\alpha = 0.5$ yields the conventional contrastive loss, but tuning is dataset-dependent: $\alpha \simeq 0.45$ works best for denser sets, and $\alpha \simeq 0.6$ for ultra-sparse domains (e.g., Alibaba-iFashion). Selection of $M$ , the number of positive views, is typically in the range $5\leq M \leq 7$ , further increasing as user positive history and dataset size permit (Tang et al., 2021).

6. Integration with Graph Encoder Architectures

MSCL is agnostic to the embedding-based encoder and is demonstrated with LightGCN and sLightGCN. User and item embeddings $e_u$ and $e_i$ are computed as the layer-wise averages over GCN propagation steps:

$e_u = \frac{1}{K+1}\sum_{k=0}^K e_u^{(k)}$

The cosine similarity between $e_u$ and $e_i$ , $f(u, i) = \cos(e_u, e_i)$ , is input to the MSCL objective, replacing prior ranking-based losses such as BPR (Tang et al., 2021).

7. Empirical Findings and Advantages in Sparse Regimes

Benchmarking on Yelp2018, Amazon-Book, and Alibaba-iFashion datasets (interaction densities $1.3 \times 10^{-3}$ , $6.2 \times 10^{-4}$ , $7 \times 10^{-5}$ , respectively), MSCL with $M \approx 7$ and tuned $\alpha$ significantly outperforms the single-positive CL baseline:

Dataset	Metric	CL	MSCL	Relative Improvement
Yelp2018	recall@20	0.0655	0.0691	+5.0% (NDCG@20)
Amazon-Book	recall@20	0.0480	0.0580	+17% (NDCG@20)
Alibaba-iFashion	recall@20	0.1152	0.1201	+4% (NDCG@20)

These gains are accompanied by marked improvements in convergence speed ( $\sim$ 50 epochs for MSCL versus $\sim$ 900 for BPR) and nearly identical per-epoch computational cost on modern GPUs (Tang et al., 2021).

In summary, MSCL operationalizes combinatorially-augmented contrastive learning, tuned by explicit positive:negative weighting. Its principal benefit lies in overcoming the dual challenges of imbalance and sparse data endemic to practical Top-K recommender systems (Tang et al., 2021).

Markdown Report Issue Upgrade to Chat

References (1)

Multi-Sample based Contrastive Loss for Top-k Recommendation (2021)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Top-K Multi-Positive Contrastive Objective.