Hellinger Similarity Contrastive Loss

Updated 21 September 2025

Hellinger Similarity Contrastive Loss is a contrastive learning approach that quantifies similarity between probabilistic feature distributions using the bounded Hellinger distance.
It replaces traditional cosine similarity with a statistically robust metric to manage uncertainty and limited samples, enhancing resistance to adversarial and natural perturbations.
Empirical evaluations demonstrate improved classification accuracy and image reconstruction quality, validating its effectiveness in few-shot and variational learning contexts.

The Hellinger Similarity Contrastive Loss Function refers to a class of contrastive learning objectives in which the similarity between feature distributions—specifically, between query samples and class prototypes—is quantified using the Hellinger distance rather than conventional metrics such as cosine similarity. This approach is especially applicable in few-shot learning scenarios, where limited data per class necessitates robust, probabilistic feature aggregation and similarity measurement. By formulating loss in terms of a bounded, symmetric distributional metric, the Hellinger Similarity Contrastive Loss provides advantages in numerical stability, probabilistic modeling, and robustness to adversarial and natural perturbations.

1. Foundations of Hellinger Distance in Contrastive Learning

The Hellinger distance $D_H$ is a statistical measure between two probability distributions $p(z)$ and $q(z)$ , defined for continuous distributions as

$D_H^2(p,q) = 1 - \int \sqrt{p(z)q(z)}dz.$

This metric is symmetric and bounded in $[0, 1]$ , properties that confer numerical stability and unbiased comparison when used in deep learning contexts. In the Hellinger Similarity Contrastive Loss formulation, class prototypes and query embeddings are treated as Gaussian distributions, with similarity determined by the inverse of the Hellinger distance: lower $D_H$ indicates stronger similarity.

2. Formal Definition of the Hellinger Similarity Contrastive Loss

In application to few-shot learning, the Hellinger Similarity Contrastive Loss (denoted $\mathcal{L}_{\text{hesim}}$ ) replaces cosine similarity with Hellinger-based similarity in a softmax-structured objective. The loss is formalized as:

$\mathcal{L}_{\text{hesim}} = -\sum_{i=1}^{N_t} y_i \log(p(y = j \mid v_{Q,i})),$

where $y_i$ is the ground-truth label, $v_{Q,i}$ is the query embedding, $N_t$ is the number of training samples, and

$p(y = j \mid v_{Q,i}) = \frac{\exp(-\{v_{Q,i}, c_{Q,j}\})}{\sum_{j=1}^{N} \exp(-\{v_{Q,i}, c_{Q,j}\})}$

Here, $\{v_{Q,i}, c_{Q,j}\}$ represents the Hellinger "distance" between the distributions parameterized by the query and class prototype embeddings. While the explicit formula for evaluating $\{v_{Q,i}, c_{Q,j}\}$ in neural embedding contexts is not specified in the source, its basis in the Hellinger metric imparts the aforementioned stability and symmetry (Lee et al., 14 Sep 2025).

3. Comparison with Cosine Similarity-Based Contrastive Losses

Traditional contrastive loss functions, such as those used in SimCLR (Chen et al. 2020), employ cosine similarity:

$\cos(\theta) = \frac{v \cdot c}{\|v\|\|c\|}$

Applied to deterministic embeddings, cosine similarity lacks sensitivity to distributional uncertainty and can be numerically unstable—particularly in small-sample or adversarial contexts. The Hellinger-based approach instead models each class and query as a distribution, facilitating comparison that smoothly incorporates estimation uncertainty and intra-class variance.

Property	Cosine Similarity	Hellinger Similarity
Range	$[-1, 1]$	$[0, 1]$
Symmetry	Yes	Yes
Distributional Sensitivity	No	Yes
Robustness (adversarial/noise)	Limited	Enhanced

This approach is particularly advantageous in few-shot regimes, where prototypes must be derived from limited samples, further motivating robust probabilistic modeling.

4. Robustness Against Adversarial and Natural Perturbations

In the ANROT-HELANet architecture (Lee et al., 14 Sep 2025), the Hellinger Similarity Contrastive Loss is integral to feature aggregation and classification. Empirical results indicate:

Resilience to adversarial perturbations of scale $\epsilon = 0.30$ and Gaussian noise $\sigma = 0.30$ .
Stability in attention maps under both adversarial and natural corruptions, as visualized via GRAD-CAM.
Consistent classification accuracy in 1-shot and 5-shot learning scenarios, with observed improvements of 1.20% and 1.40% (miniImageNet).

These results highlight the loss function's capacity to maintain reliable gradients and stable feature prototypes, even under substantial perturbations. A plausible implication is that bounded similarity metrics such as the Hellinger distance mitigate the risk of gradient explosion and sensitivity to outlier samples.

5. Impact on Image Reconstruction in Variational Frameworks

Beyond classification, the use of Hellinger distance in variational autoencoder (VAE) objectives improves image reconstruction quality. The ANROT-HELANet reports a Fréchet Inception Distance (FID) of 2.75, outperforming VAE with KL divergence (3.43) and Wasserstein autoencoder variants (3.38). This suggests that the choice of Hellinger distance yields more coherent latent representations in probabilistic models, reflecting its favorable properties for distributional matching.

6. Experimental Evaluation and Benchmarks

Systematic experiments on miniImageNet, tieredImageNet, CIFAR-FS, and FC-100 demonstrate that the Hellinger Similarity Contrastive Loss consistently improves upon alternatives, including KL-based and cosine similarity-based objectives. Ablation studies reveal that its integration with attention mechanisms supports further gains in accuracy, particularly in low-shot learning and under distributional perturbation.

The robustness and accuracy benefits are supported by both classification and reconstruction metrics. The combination of probabilistic embedding, bounded symmetric metric, and contrastive softmax formulation underpins state-of-the-art performance in few-shot settings.

7. Summary and Contextualization

The Hellinger Similarity Contrastive Loss function generalizes traditional similarity metrics by integrating robust statistical distance into contrastive learning frameworks, especially within variational few-shot inference and adversarially robust architectures. Its bounded, symmetric nature enables stable optimization, reliable prototype estimation, and improved resistance to both adversarial and natural perturbations. Evaluations across standard benchmarks and under multiple noise scenarios substantiate its efficacy in both classification and generative modeling tasks. The shift from point-wise deterministic similarity to distributional comparison via the Hellinger metric reflects an important methodological evolution in contrastive metric learning, particularly for domains characterized by small sample sizes and high variability (Lee et al., 14 Sep 2025).

PDF Markdown Chat (Pro)

References (1)

ANROT-HELANet: Adverserially and Naturally Robust Attention-Based Aggregation Network via The Hellinger Distance for Few-Shot Classification (2025)

Follow Topic

Get notified by email when new papers are published related to Hellinger Similarity Contrastive Loss Function.