Centroid Alignment Loss in Deep Learning

Updated 3 July 2026

Centroid alignment loss is a regularization approach that aligns feature centroids to enhance intra-class compactness and inter-class separation.
It employs formulations such as squared Euclidean, cosine alignment, and contrastive losses to optimize discriminative and cross-modal feature embeddings.
Applied in supervised classification, few-shot learning, and domain adaptation, it improves fairness, interpretability, and robustness with minimal computational overhead.

Centroid alignment loss refers to a class of loss functions and regularization terms in machine learning that enforce geometric alignment of feature centroids—typically of classes, modalities, groups, or instances—in an embedding space. The primary aim is to promote discriminability, compactness, and structural correspondence among feature distributions, with application in supervised classification, cross-modal retrieval, fairness, domain adaptation, and weak supervision. Centroid alignment losses can be constructed using prescribed centroids, centroids estimated from data, or dynamic centroids evolving jointly with model updates.

1. Mathematical Definitions: Centroid Alignment Objectives

Centroid alignment losses are constructed by quantifying and reducing misalignment between sets of centroids. Explicitly, suppose features $\{\mathbf{z}_i\}$ are mapped by a neural network or projection. Centroids $\{\boldsymbol{\mu}_k\}$ for classes, modalities, or groups are computed as class-specific means: $\boldsymbol{\mu}_k = \frac{1}{N_k}\sum_{i:y_i=k} \mathbf{z}_i$

A typical centroid alignment loss penalizes either distance between sample features and their assigned centroids (minimizing intra-cluster variance), between centroids themselves (maximizing inter-cluster distances), or between centroids of different groups to enforce alignment. Forms include:

Squared Euclidean/MSE loss: $\|\mathbf{z}_i - \boldsymbol{\mu}_{y_i}\|_2^2$ or its softmaxed, temperature-scaled, or weighted variants.
Cosine alignment: $1 - \cos(\mathbf{z}_i, \boldsymbol{\mu}_{y_i})$ or negative log-softmax over centroid dot products.
Aggregate centroid difference: $\|\bar{\mathbf{v}} - \bar{\mathbf{t}}\|_2$ for modality centroids in vision-LLMs (Liu et al., 31 Mar 2026).
Centroid "fairness" regression losses: penalizing deviation of groupwise centroid-based scores from a reference target (Conti et al., 27 Apr 2025).

Some frameworks use pre-defined evenly-distributed centroids (PEDCC), e.g., by maximizing mutual repulsion on a sphere (Zhu et al., 2019). Others compute centroids dynamically per batch or epoch, or maintain them via exponential moving average (EMA) (Zhou et al., 2022).

2. Core Methodologies and Loss Variants

Centroid alignment losses are instantiated in numerous architectures. Canonical settings include:

Supervised classification: Aligning deep features to class-determined centroids to tighten intra-class clusters and maximize inter-class separation (Zhu et al., 2019, Zhou et al., 2022).
Cross-modal alignment: Reducing the centroid gap between modalities (e.g., images $\mathbf{v}_i$ and texts $\mathbf{t}_i$ ) to promote modality-invariant embeddings, as in the TPC-CMA method:

$\mathcal G_C = \|\bar{\mathbf{v}} - \bar{\mathbf{t}}\|_2$

with the centroid-alignment ("negative reweighting") loss formulated as:

$\mathcal L_{\mathrm{rw}} = \mathrm{CE}(\mathbf{M}\odot\mathbf{S}, \mathbf{y}) + \mathrm{CE}((\mathbf{M}\odot\mathbf{S})^\top, \mathbf{y})$

where $\{\boldsymbol{\mu}_k\}$ 0 downweights negative (non-matching) logits and reduces repulsion to allow centroids to drift together (Liu et al., 31 Mar 2026).

Few-shot and representation learning: Pulling queries toward centroids of support or related base classes using metric-learning based negative log-softmax of squared distances (Afrasiyabi et al., 2019):

$\{\boldsymbol{\mu}_k\}$ 1

Domain adaptation and fairness: Aligning group centroids to mitigate feature fragmentation or bias, as in global patient alignment losses (Jeong et al., 28 May 2025) or centroid fairness regression (Conti et al., 27 Apr 2025). Losses here penalize the squared norm between group centroids and a global centroid or target quantiles.
Weakly-supervised segmentation: Incorporating centroid alignment cross-entropy terms over annotated pixels to cluster features per class (Yao et al., 2020).

3. Algorithmic Schemes and Optimization

Centroid alignment losses may be chain-ruled through centroids that are either fixed, learned, or dynamically recomputed:

Fixed or prescribed centroids: Training layer weights are set to predefined centroids, e.g., PEDCC centroids spread on the hypersphere (Zhu et al., 2019). The loss is a sum of a margin-based cross-entropy on angular similarity and an MSE term aligning features to centroids.
Online/EMA-updated centroids: Class centroids are updated via EMA or computed over the current minibatch (Zhou et al., 2022); gradients typically flow only through features, not centroids themselves.
Jointly optimized centroids: Centroids are treated as optimization variables, updated by backpropagation or K-means steps, as in floating centroid methods (Islam et al., 2019).

Example pseudocode for Feature Centroid Contrast Learning (FCCL) (Zhou et al., 2022):

$\{\boldsymbol{\mu}_k\}$ 6

4. Theoretical Properties and Relationships

The explicit centroid alignment regime brings tractable geometric and statistical properties:

Closed-form solutions: For linear projection settings (e.g., Supervised Linear Centroid-Encoder), centroid reconstruction loss admits eigendecomposition-based closed forms, with the minimizing mapping being the top- $\{\boldsymbol{\mu}_k\}$ 2 eigenvectors of a matrix synthesizing sample and centroid matrices (Ghosh et al., 2023).
Dimensionality limits: In SLCE, the matrix $\{\boldsymbol{\mu}_k\}$ 3 has at most $\{\boldsymbol{\mu}_k\}$ 4 positive eigenvalues (number of classes minus one), dictating the effective dimension of discriminative centroid structure.
Trade-off control: Hyperparameters balancing between centroid tightness and separation, or between cross-entropy and centroid terms, enable navigation of accuracy/compactness versus invariance or fairness (Liu et al., 31 Mar 2026, Zhu et al., 2019, Jeong et al., 28 May 2025).
Relation to other objectives: Centroid alignment is mathematically linked to the minimization of within-class scatter and maximization of between-class scatter, akin to Linear Discriminant Analysis, but often in a metric-learning, kernel, or nonlinear regime.

5. Empirical Impacts and Applications

Centroid alignment losses have been empirically demonstrated to yield substantial improvements across modalities and domains:

Application Domain	Centroid Alignment Technique	Reported Impact ([arXiv])
Vision–LLMs	Negative reweighting (TPC-CMA)	66–82% modality gap reduction, clustering ARI +0.20, CIDEr +57% (Liu et al., 31 Mar 2026)
Supervised Dim. Reduction	SLCE centroid-reconstruction	Outperforms classical PCA and LDA (Ghosh et al., 2023)
Neural Classifiers	Floating centroid loss	GA +1–4% over cross-entropy (Islam et al., 2019)
Few-Shot Learning	Centroid softmax alignment with related bases	Accuracy +1–6% absolute in 5-shot (Afrasiyabi et al., 2019)
Fairness in Face Rec.	Centroid regression aligning ROC curves	Bias_FAR reduced 30–50% at constant ROC (Conti et al., 27 Apr 2025)
Domain Adaptation	Contrastive centroid supervision (FCCL)	Cross-domain accuracy improvement without target data (Zhou et al., 2022)
Semantic Segmentation	Centroid alignment cross-entropy	mIoU +10–30 pts under weak supervision (Yao et al., 2020)
Biomedical Clustering	Patient-global centroid alignment (GPAL)	+1% ICBHI Score, better generalization (Jeong et al., 28 May 2025)

These improvements hold for cross-modal correspondence, intra-class compactness, enhanced feature clustering under weak labels, and for algorithmic fairness, with minimal sacrifices in raw predictive accuracy.

6. Schedules, Hyperparameters, and Practical Recommendations

Optimization of centroid alignment objectives typically requires careful regulation of trade-off strengths:

Curriculum schedules: Three-phase curriculum with anchor, ramp-up, and stabilize regimes is deployed to avoid feature collapse or catastrophic forgetting during strong cross-modal alignment, e.g., in the TPC-CMA framework (Liu et al., 31 Mar 2026).
Control parameters ( $\{\boldsymbol{\mu}_k\}$ 5): Directly alter the weight of alignment versus discriminative (e.g., cross-entropy) objectives, providing user-controllable navigation of the invariance–accuracy axis.
Batchwise computation: For stability, centroid statistics are usually aggregated per minibatch, though larger batches or EMA updates are beneficial for estimator consistency (Zhou et al., 2022, Jeong et al., 28 May 2025).
No intervention at inference: In many frameworks (e.g., PAFA, certain segmentation methods), centroids or alignment modules are used only during training, with test-time inference relying solely on the main prediction head (Jeong et al., 28 May 2025, Yao et al., 2020).
Hyperparameter grids: Empirical selection (grid search) is recommended for alignment weights, centroid updating rates, and temperature/scale factors (Yao et al., 2020, Liu et al., 31 Mar 2026), as performance is sensitive to these choices.

7. Theoretical and Practical Significance

Centroid alignment loss represents a broad, flexible paradigm for embedding learning and structured regularization. The approach enables models to:

Enforce geometric invariants or desirable group structure in latent spaces.
Attack cross-domain, cross-modal, and fairness challenges without architecture redesign.
Provide clear interpretability through explicit centroid mapping.
Achieve state-of-the-art results in challenging regimes (e.g., few-shot, domain shift, weak labeling).

Its theoretical appeal lies in the direct link between geometric objectives (centroid alignment), closed-form or efficiently optimizable loss structures, and principled trade-offs between discrimination, invariance, and fairness (Liu et al., 31 Mar 2026, Ghosh et al., 2023, Conti et al., 27 Apr 2025).

Empirical ablation consistently shows nontrivial performance gains and increased robustness when centroid alignment losses are properly configured and scheduled, with minimal computational overhead relative to baseline objectives.

References

"The Geometry of Compromise: Unlocking Generative Capabilities via Controllable Modality Alignment" (Liu et al., 31 Mar 2026)
"Yet Another Algorithm for Supervised Principal Component Analysis: Supervised Linear Centroid-Encoder" (Ghosh et al., 2023)
"Improving Neural Network Classifier using Gradient-based Floating Centroid Method" (Islam et al., 2019)
"Associative Alignment for Few-shot Image Classification" (Afrasiyabi et al., 2019)
"A Weakly-Supervised Semantic Segmentation Approach based on the Centroid Loss: Application to Quality Control and Inspection" (Yao et al., 2020)
"Contrastive Centroid Supervision Alleviates Domain Shift in Medical Image Classification" (Zhou et al., 2022)
"A New Loss Function for CNN Classifier Based on Pre-defined Evenly-Distributed Class Centroids" (Zhu et al., 2019)
"Mitigating Bias in Facial Recognition Systems: Centroid Fairness Loss Optimization" (Conti et al., 27 Apr 2025)
"Patient-Aware Feature Alignment for Robust Lung Sound Classification: Cohesion-Separation and Global Alignment Losses" (Jeong et al., 28 May 2025)
"Generalized Centroid Estimators in Bioinformatics" (Hamada et al., 2013)