Multi-Similarity Loss in Deep Metric Learning

Updated 26 May 2026

Multi-Similarity Loss is a loss function for deep metric learning that integrates self, positive-relative, and negative-relative signals to improve embedding quality.
It employs a General Pair Weighting framework to mine and weight informative pairs, ensuring larger gradient signals and efficient training.
Extensions like MSCon and SMS adapt the loss for multi-attribute and soft-label scenarios, significantly boosting retrieval accuracy and generalization.

Multi-Similarity Loss is a class of loss functions central to modern deep metric learning and contrastive representation learning. It regularizes embedding models by leveraging information from multiple notions of similarity, outperforming traditional pair/triplet-based approaches in image retrieval, cross-modal retrieval, and robust representation learning. Integrating the General Pair Weighting (GPW) framework, multi-similarity loss enables principled and efficient mining and weighting of training pairs, and extensions such as Multi-Similarity Contrastive Loss (MSCon) and Symmetric Multi-Similarity Loss (SMS) exploit multiple metrics or soft-label information for enhanced performance and generalization.

1. Motivation and Historical Background

In classical deep metric learning, most methods relied on fixed rules for positive and negative mining, such as “contrastive,” “triplet,” and “lifted-structure” losses. These approaches were fundamentally limited by redundant pair sampling and coarse, uniform weighting schemes. They typically only exploited a single “signal” per pair: either the raw similarity (“self”), the relative ranking among positives, or the separation from negatives in the batch.

Multi-Similarity Loss (MS Loss), introduced by Wang et al. (Wang et al., 2019), addressed these limitations by supporting three distinct similarity signals—self, positive-relative, and negative-relative—within a unified, differentiable formulation. This principled weighting allows broader exploitation of batch information and enables larger, more informative gradients per batch step. Subsequent variants, such as Multi-Similarity Contrastive Loss (MSCon) and Symmetric Multi-Similarity Loss (SMS), further extended this framework to settings with multiple, possibly uncertain, notions of similarity (Mu et al., 2023, Wang et al., 2024). Such scenarios are prevalent in real-world data, where objects are annotated with multiple categorical or soft affiliations.

2. The General Pair Weighting and Multi-Similarity Loss Formulation

Multi-Similarity Loss is rooted in the General Pair Weighting (GPW) view, where the gradient of any pair-based metric learning loss decomposes as a sum of pairwise weights:

$\frac{\partial \mathcal{L}}{\partial \theta} = \sum_{i,j} w_{ij} \frac{\partial S_{ij}}{\partial \theta}$

where $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ , $S_{ij}$ is the cosine similarity between $f(x_i)$ and $f(x_j)$ , and $f$ is a unit-normalizing embedding function.

Original Multi-Similarity Loss

Given a batch $\{x_i\}$ with labels $y_i$ , positives $P_i$ , and negatives $N_i$ , the loss is formulated as:

$w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 0

with sharpness parameters $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 1 and margin $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 2 (Wang et al., 2019). The loss uses an explicit mining step to focus on “informative” positives/negatives based on relative similarity, followed by a soft weighting based on both $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 3 and its hardness compared to other pairs.

Mining and Weighting Mechanism

Informative positive set: $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 4
Informative negative set: $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 5
Pairs are then exponentially weighted and combined in the loss.

This design ensures that only the most “violating” or “hard” pairs contribute significant gradient signal, improving both retrieval precision and training efficiency.

3. Extensions: Multi-Similarity Contrastive and Symmetric Multi-Similarity Losses

Multi-Similarity Contrastive Loss (MSCon)

When data carries multiple categorical or semantic attributes (e.g., category, closure, gender for images), each attribute induces a distinct similarity relation. MSCon, as introduced by Mu et al. (Mu et al., 2023), learns one projection head per metric and forms a multi-similarity objective by summing a supervised contrastive (SupCon) loss per metric:

$w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 6

where $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 7 is a SupCon loss over the $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 8 relational head.

Uncertainty-based Task Weighting

MSCon incorporates a learnable task-specific uncertainty $w_{ij} = \left| \frac{\partial \mathcal{L}}{\partial S_{ij}} \right|$ 9, yielding the regularized objective:

$S_{ij}$ 0

This weighting down-scales the contribution of “uncertain” or noisy similarity tasks, leading to better out-of-domain (OOD) generalization and more robust multi-attribute representations (Mu et al., 2023).

Symmetric Multi-Similarity Loss

For cross-modal or soft-label scenarios (e.g., video–text with soft correlation matrices), the Symmetric Multi-Similarity Loss (SMS) employs the difference between soft correlation scores $S_{ij}$ 1 as the margin, enforcing a symmetric ordering via hinge-style triplet loss:

$S_{ij}$ 2

where $S_{ij}$ 3 controls the margin and $S_{ij}$ 4 is a relaxation factor to prevent degenerate updates when $S_{ij}$ 5 (Wang et al., 2024).

4. Algorithmic and Implementation Details

Multi-Similarity Loss and its derivatives are implemented via efficient matrix operations within deep learning frameworks:

Batch construction: Use multiple samples per class to enable informative positive and negative mining.
Pairwise similarity matrix computation: Compute all cosine similarities in the batch ( $S_{ij}$ 6); efficient masking is used to select anchor–positive and anchor–negative pairs.
Mining step: For each anchor, vectorized reduction is used to extract hardest positives/negatives and construct the sets $S_{ij}$ 7, $S_{ij}$ 8.
Weighting step: Exponential (softmax-like) weighting over the mined pairs for greater gradient selectivity.
Stabilization: Care is taken to avoid numerical overflow in exponentials by judicious parameter selection (e.g., $S_{ij}$ 9, $f(x_i)$ 0).
Batch size: Empirically, robust estimation requires batch sizes of at least $f(x_i)$ 1– $f(x_i)$ 2 for effective mining.
Final update: Fully vectorized gradient calculation is supported, with no need for custom backward passes.

For multi-task cases (MSCon), each metric’s loss is weighted by the inverse variance $f(x_i)$ 3 and self-regularized by $f(x_i)$ 4; gradients are accumulated over all tasks before joint optimization (Mu et al., 2023).

5. Empirical Performance and Ablation Studies

Multi-Similarity Loss and its generalizations deliver state-of-the-art performance on multiple benchmarks:

Dataset	Loss/Method	Recall@1 (%) or Top-1 (%)	Key Setting/Attribute
CUB-200	MS Loss	65.7	d=512, vs. 60.6 (ABE)
Cars-196	MS Loss	84.1	vs. 81.4 (HTL)
In-Shop Clothes	MS Loss	89.7	vs. 80.9 (prior)
SOP	MS Loss	78.2	vs. 74.8 (ABE)
Zappos50k	MSCon	97.17/94.37/85.98	Category/Closure/Gender
MEDIC	MSCon	81.00/79.14/81.69/85.15	Multi-attribute, in-domain
EK-100	SMS	57.0/69.2, 62.1/73.0	ViT-B, ViT-L (mAP/nDCG)

Ablation studies reveal that:

Incorporating all three signals (P+S+N) yields stronger performance than using any single mining or weighting component (Wang et al., 2019).
Learned uncertainty weighting (in MSCon) significantly improves out-of-domain accuracy, especially when certain similarity metrics are noisy or intentionally corrupted (Mu et al., 2023).
Introducing relaxation factor $f(x_i)$ 5 in SMS yields notable boosts in mAP (Wang et al., 2024).
SMS outperforms adaptive MI-MM variants by explicit utilization of soft-label differences and symmetric loss structure (Wang et al., 2024).

6. Comparative Analysis and Practical Implications

Multi-Similarity Loss unifies and extends traditional pair-based and triplet-based losses:

Contrastive loss: Only exploits self-similarity, with all mined pairs weighted equally.
Triplet/Historam/Lifted structure: Partially exploit positive- or negative- relative signals, but lack joint mining and weighting.
MS Loss: Combines strict mining (positive-relative) with soft, differentiable weighting (self and negative-relative), yielding sharper gradient focus and better utilization of informative pairs (Wang et al., 2019).

Extensions such as MSCon and SMS are directly suited to multi-task and soft-label settings:

MSCon dynamically balances contributions from multiple relations by uncertainty-based weights, leading to generalizable and robust embedding models (Mu et al., 2023).
SMS generalizes the batch mining and weighting approach to soft, real-valued label relations, appropriate for complex retrieval benchmarks (Wang et al., 2024).

These methods are suited for retrieval, classification with multiple labels, and scenarios with heterogeneous or noisy supervision.

7. Limitations and Best Practices

Key limitations and recommendations include:

Mining margin $f(x_i)$ 6 and weighting sharpness parameters must be selected appropriately to ensure the presence of informative pairs and avoid degenerate gradients.
Batch size must be sufficient to supply positives/negatives per anchor.
Large values of $f(x_i)$ 7 or $f(x_i)$ 8 may cause numerical overflow; parameter tuning or use of mixed precision is advised.
In multi-task or attribute-rich settings, uncertainty-based weighting is critical to prevent noisy tasks from degrading overall representations.
For soft-label scenarios, the relaxation term $f(x_i)$ 9 prevents wasted model capacity on near-duplicate label pairs.

Adhering to these guidelines, Multi-Similarity Loss remains a robust and adaptable family for high-performance metric learning across domains (Wang et al., 2019, Mu et al., 2023, Wang et al., 2024).

Markdown Report Issue Upgrade to Chat

References (3)

Multi-Similarity Loss with General Pair Weighting for Deep Metric Learning (2019)

Multi-Similarity Contrastive Learning (2023)

Symmetric Multi-Similarity Loss for EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2024 (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Similarity Loss.

Multi-Similarity Loss in Deep Metric Learning

1. Motivation and Historical Background

2. The General Pair Weighting and Multi-Similarity Loss Formulation

Original Multi-Similarity Loss

Mining and Weighting Mechanism

3. Extensions: Multi-Similarity Contrastive and Symmetric Multi-Similarity Losses

Multi-Similarity Contrastive Loss (MSCon)

Uncertainty-based Task Weighting

Symmetric Multi-Similarity Loss

4. Algorithmic and Implementation Details

5. Empirical Performance and Ablation Studies

6. Comparative Analysis and Practical Implications

7. Limitations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Multi-Similarity Loss in Deep Metric Learning

1. Motivation and Historical Background

2. The General Pair Weighting and Multi-Similarity Loss Formulation

Original Multi-Similarity Loss

Mining and Weighting Mechanism

3. Extensions: Multi-Similarity Contrastive and Symmetric Multi-Similarity Losses

Multi-Similarity Contrastive Loss (MSCon)

Uncertainty-based Task Weighting

Symmetric Multi-Similarity Loss

4. Algorithmic and Implementation Details

5. Empirical Performance and Ablation Studies

6. Comparative Analysis and Practical Implications

7. Limitations and Best Practices

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research