Papers
Topics
Authors
Recent
Search
2000 character limit reached

SDL: Similarity-Dissimilarity Loss Overview

Updated 21 May 2026
  • Similarity-Dissimilarity Loss (SDL) is a loss function family that enforces both closeness for similar samples and separation for dissimilar ones in learned embeddings.
  • It is applied in diverse fields such as domain adaptation, multi-label contrastive learning, medical image segmentation, and unsupervised clustering using various mathematical formulations.
  • Effective use of SDL requires careful selection of sample pairs, metric calibration, and hyperparameter tuning to enhance model performance and robustness.

A Similarity-Dissimilarity Loss (SDL) is a broad family of objective functions that impose explicit constraints promoting both similarity between “like” samples or features, and dissimilarity between “unlike” samples or features, typically within a learned embedding or output space. SDL is instantiated variably across deep learning, clustering, contrastive representation learning, multicriteria decision analysis, and medical imaging. This article surveys SDL principles, mathematical formulations, methodologies, and notable applications across research domains.

1. Formal Definitions and General Principles

An SDL function is constructed to simultaneously attract matched or related instances (enforcing similarity) and repel mismatched or unrelated instances (imposing dissimilarity) in a latent, output, or feature space. The two parts—similarity and dissimilarity—are typically summed (or composed), with each part defined via a suitable metric, kernel, or contrastive term. Core design choices include:

  • The selection of pairs or sets for similarity and dissimilarity evaluation.
  • The space (embedded, pixel, or class label) in which distances or overlaps are measured.
  • The mathematical form of the loss (contrastive, min–max, overlap-based, probabilistic, etc.).
  • The weighting or trade-off between the two components, often as hyperparameters in the overall objective function.

2. Mathematical Realizations in Deep and Statistical Learning

2.1. Contrastive SDL in Domain Adaptation

In SPGAN (Deng et al., 2017) for unsupervised image-image translation, SDL is realized as a sum of contrastive self-similarity and domain-dissimilarity losses:

  • Self-similarity loss:

Lself=ExSLcon(1, xS, G(xS))+ExTLcon(1, xT, F(xT))\mathcal{L}_{\rm self} = \mathbb{E}_{x_{\mathcal{S}}} \mathcal{L}_{\rm con}\left(1,\ x_{\mathcal{S}},\ G(x_{\mathcal{S}})\right) + \mathbb{E}_{x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(1,\ x_{\mathcal{T}},\ F(x_{\mathcal{T}})\right)

It ensures that the embedding of an image before and after domain translation remains close.

  • Domain-dissimilarity loss:

Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)

It ensures that translated images are embedded away from examples of the “other” domain’s identities.

  • The total SDL term is:

LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}

The overall SPGAN objective incorporates SDL alongside CycleGAN adversarial, cycle-consistency, and identity losses, controlled by set trade-off parameters. SDL plays a critical role in preserving discriminative identity cues and preventing collapse onto target domain classes in unsupervised domain adaptation (Deng et al., 2017).

2.2. Multi-label Supervised Contrastive Learning

In multi-label settings, such as in (Huang et al., 2024), SDL governs the relative strength of positive sample contributions based on fine-grained label relations. For an anchor sample ii and a positive pp (sharing at least one label), the sample-to-anchor weight is the product:

wi,p=Ki,psKi,pdw_{i,p} = K^s_{i,p} K^d_{i,p}

where:

  • Ki,ps=STSK^s_{i,p} = \frac{|S \cap T|}{|S|} reflects overlap proportion.
  • Ki,pd=11+T(ST)K^d_{i,p} = \frac{1}{1+|T \setminus (S \cap T)|} discounts for label mismatch.

The loss then re-weights the contrastive similarity (dot product in embedding space) by wi,pw_{i,p}, with positives and negatives determined by precise label-set relations (disjoint, exact, partial, subset/superset). This yields a discriminative, sample-dependent interpolation between fully-positive and fully-negative effects, resolving label-ambiguity issues in multi-label contrastive objectives (Huang et al., 2024).

2.3. Dice-based Similarity-Dissimilarity Losses in Segmentation

In medical image segmentation, the “soft Dice” loss (SDL) quantifies soft overlap between a predicted segmentation x[0,1]px \in [0,1]^p and a binary (or soft) mask Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)0:

Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)1

where Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)2 denotes the scalar product and Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)3 the Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)4-norm. The loss penalizes both false positives and negatives with a structure sensitive to and balanced between similarity and dissimilarity signals (Wang et al., 2023).

Extensions to proper semimetric Dice losses (DML1, DML2) overcome the bias induced by naive soft-label use, preserving reflexivity and providing unique minimizers at Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)5 regardless of label ambiguity (Wang et al., 2023).

2.4. SDL in Unsupervised Clustering

The unsupervised clustering framework of (Kostadinov et al., 2019) jointly optimizes nonlinear transforms and clustering assignments by means of a compositional min–max SDL:

Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)6

  • Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)7 and Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)8 respectively encode directional similarity/dissimilarity and quadratic similarity.
  • The min–max assignment ensures assignment to the “closest” discriminant, maximally separated from “similarity” prototypes.

SDL is integrated into the joint objective, controlling both representation selection and parameter learning (Kostadinov et al., 2019).

3. SDL in Multi-Criteria Decision and Robust Optimization

In the Cat-SD framework for nominal classification, SDL is employed in the assignment of actions to categories on the basis of a hierarchy of criteria via similarity, dissimilarity, and likeness degree computations (Costa et al., 2018). For each action Ldissim=ExS,xTLcon(0, G(xS), xT)+ExS,xTLcon(0, F(xT), xS)\mathcal{L}_{\rm dissim} = \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ G(x_{\mathcal{S}}),\ x_{\mathcal{T}}\right) + \mathbb{E}_{x_{\mathcal{S}}, x_{\mathcal{T}}} \mathcal{L}_{\rm con}\left(0,\ F(x_{\mathcal{T}}),\ x_{\mathcal{S}}\right)9 and reference action LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}0 in category LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}1:

  • Per-criterion similarity/dissimilarity LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}2 (piecewise, calibrated).
  • Aggregated via weighted sums and interaction coefficients (synergy, redundancy, antagonism).
  • With assigned degrees LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}3.
  • Assignment is thresholded at prescribed likeness levels.

SDL, together with SMAA (Stochastic Multicriteria Acceptability Analysis), defines a linear loss function over probabilistic assignments, solved via LP to deliver a robust, optimal deterministic classification aligning with the inferred likelihoods under parameter uncertainty (Costa et al., 2018).

4. Empirical Performance and Comparisons

SDL-based objectives systematically outperform simpler, unweighted, or purely similarity-based loss constructions in multiple tasks:

  • In multi-label contrastive learning (MSCL), re-weighted SDL achieves higher Macro-AUC and Macro-F1 in text (MIMIC-III/IV) and vision (MS-COCO) compared to ALL, ANY, and MulSupCon strategies (Huang et al., 2024).
  • For segmentation with soft labels, DML1 (SDL-extended as a semimetric) improves Dice score (+3.05%), binarized Dice (+4.38%), and reduces ECE over naive SDL and hard-label training in multi-rater QUBIQ data (Wang et al., 2023).
  • Min–max SDL in nonlinear transform clustering achieves state-of-the-art cluster accuracy and NMI scores without labels, outperforming self-expressive and graph-regularized baselines (Kostadinov et al., 2019).
  • In multicriteria assignment, SDL-based LP formulations yield robust, interpretable assignments even under considerable parameter uncertainty, as demonstrated in hierarchical and imprecise preference scenarios (Costa et al., 2018).
Domain SDL Instantiation Key Empirical Gains
Multi-label CL Weighted contrastive +1–5 points Macro/Micro-F1, AUC
Segmentation Soft Dice/DML1 +2–4% Dice, lower ECE
Clustering Min–max SDL +1–2% CA/NMI vs. state-of-the-art
Decision Analysis Hierarchical SDL-LP Robust, constraint-compliant assignment

5. Implementation and Practical Recommendations

SDL instantiations require careful selection of positive/negative pairs (e.g., in MSCL, based on label-set relations (Huang et al., 2024)), correct weighting of terms (e.g., LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}4 for SDL in SPGAN (Deng et al., 2017)), and margin or normalization hyperparameters (e.g., margin LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}5 in contrastive SDL, normalization in DMLs (Wang et al., 2023)). For medical image segmentation, DML1 is recommended when leveraging soft labels from label smoothing, knowledge distillation, or multi-rater settings to avoid convergence to binary corners and secure calibration.

For modularity and extension, SDL is typically differentiable and compatible with gradient-based optimizers. In clustering scenarios, closed-form soft-thresholding solutions enable efficient assignment (Kostadinov et al., 2019). In multicriteria assignment, parameter sampling (e.g., Hit-and-Run for SMAA) and LP solution steps are computationally efficient for moderate action/category sizes (Costa et al., 2018).

6. Theoretical Properties and Limitations

SDL variants often preserve important theoretical guarantees:

  • Semimetricity (reflexivity, symmetry, positivity, relaxed triangle inequality) in Dice-based DMLs (Wang et al., 2023).
  • Smooth interpolation and exact zero cost for perfect matches or total mismatches in MSCL (property LSDL=Lself+Ldissim\mathcal{L}_{\rm SDL} = \mathcal{L}_{\rm self} + \mathcal{L}_{\rm dissim}6) (Huang et al., 2024).
  • Robustness to parameter uncertainty via probabilistic assignment in decision analysis (Costa et al., 2018).

A known limitation is the requirement for calibrated similarity/dissimilarity measures; naïve application (as for soft Dice loss with inexact soft labels) can introduce bias and degrade model calibration (Wang et al., 2023). SDL-based clustering requires careful balancing of similarity and dissimilarity priors to avoid trivial solutions or overfitting (Kostadinov et al., 2019).

7. Application Scope and Research Directions

SDL frameworks are now central to:

Future research is likely to explore adaptive, data-driven selection of similarity/dissimilarity criteria, robust optimization in even deeper or noisier label hierarchies, and scalable extensions to large-scale, multimodal settings. SDL’s flexibility and principled character guarantee ongoing utility across methodological advances in representation learning, semi-supervised inference, and complex decision systems.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Similarity-Dissimilarity Loss (SDL).