Centroid-Level Representations

Updated 26 November 2025

Centroid-level representations are aggregate vectors that summarize groups of data points using mean or prototype methods to capture structural, semantic, or statistical properties.
They support diverse tasks such as classification, clustering, retrieval, and anomaly detection through domain-specific construction and update rules.
These methods offer interpretability, scalability, and robustness, while addressing challenges like intra-class variability and heuristic parameter selection.

Centroid-level representations refer to the use of explicit cluster centroids—mean, prototype, or aggregate vectors—to summarize structural, semantic, or statistical properties of groups of data points at various abstraction layers. These centroids can represent classes, clusters, objects, concepts, or even geometric or probabilistic aggregations across a variety of data modalities (images, text, 3D shapes, uncertain data), supporting tasks such as classification, clustering, retrieval, instance association, and efficient neural inference. The methodologies for computing, updating, and employing centroids differ by application domain, learning paradigm, and theoretical underpinnings, including vector space, non-Euclidean manifolds, or probabilistic models.

1. Mathematical Formalisms and Centroid Construction

The mathematical definition of centroids varies across contexts:

Vector Spaces: In standard settings, a centroid $c_k$ for class $k$ is the arithmetic mean over all sample embeddings,

$c_k = \frac{1}{N_k} \sum_{i=1}^{N_k} f(x_i),$

where $f(x_i)$ is the feature embedding and $N_k$ the cardinality (Wieczorek et al., 2021, Wu et al., 2021).

Probabilistic/Uncertain Data: The U-centroid is a random variable induced by the distributions of all objects in the cluster. For cluster $C=\{o_1,...,o_{|C|}\}$ , with each $o_i$ a random variable, the U-centroid is $X_c = \frac{1}{|C|} \sum_i Y_i$ , with $Y_i \sim f_i$ independently (Gullo et al., 2012).
Non-Euclidean Manifolds: For hyperbolic embeddings, centroids are not vector means but points constructed via iterated Möbius addition and scaling or constructive algorithms based on geodesic midpoints and recursion (e.g., LFC, LBC, LAC, BTC) (Gerek et al., 2022).
Object-centric Representations: In vision, object-centric centroids $c_n$ are 2D positions calculated as a weighted pixel average within an object mask, i.e.,

$c_n = \frac{1}{\sum_p M_n(p)}\sum_p p\, M_n(p),$

for hard mask $M_n$ (Lei et al., 18 Nov 2025).

Structural Shape Averaging: The centroid in elastic mean contour frameworks involves a closed-form for optimal translation, coupling parameterization and centroid determination via double energy minimization (Molnar et al., 2019).

These formalisms underlie practical algorithms for efficient learning and inference across domains.

2. Centroid-Level Representations in Supervised and Unsupervised Learning

Class Prototypes: In supervised settings, centroids serve as class prototypes, supporting non-parametric classification, contrastive learning, and incremental learning by providing class reference points or adaptation anchors (Fukuda et al., 2024, Wieczorek et al., 2021, Tiong et al., 2021).
Clustering: Unsupervised algorithms such as Agg-Var clustering in concept learning (Ayub et al., 2019), partitional clustering under uncertainty (Gullo et al., 2012), and centroid attention layers in transformers (Wu et al., 2021) all depend on iterative or online updates of centroids defined by either proximity constraints, distributional aggregation, or differentiable soft assignments.
Contrastive Learning: EMA-updated class centroids are crucial in long-tailed recognition, where interpolated embeddings are explicitly trained to retrieve multiple centroids, regularizing both head and tail representations (Tiong et al., 2021).
One-Class and Anomaly Detection: Centroid-level representations are employed to define "bona fide" clusters—using an adaptive centroid updated via a running weighted average—which a one-class loss then compacts while forcing outliers away (Kim et al., 2024).

3. Object-Centric and Spatial Centroid Representations

In structured perception, centroids are tightly linked to semantic or physical object entities:

Token Compression and Object-centric LVLMs: CORE forms object tokens by pooling representations within segmentation masks to their centroids, yielding compact yet semantically aligned visual tokens. Centroid-guided sorting preserves spatial order for efficient LLM fusion (Lei et al., 18 Nov 2025).
Human Pose and Instance Segmentation: VISUALCENT uses keypoint-centric and mask-centric offsets to dynamically estimate per-person and per-instance centroids, enabling robust joint segmentation in crowded or occluded scenes (Ahmad et al., 26 Apr 2025).
3D Shape Descriptors: The SGC descriptor encodes local geometry by computing centroids and point densities within voxelized grids referenced by local object-centric frames, yielding robust partial-to-partial shape fingerprints (Tang et al., 2016).

4. Computational Procedures and Algorithmic Variations

Centroid computation is adapted to suit application constraints:

Method/class	Update Rule or Computation	Key Features
Mean/EMA (Euclidean)	Simple mean or $m c^k + (1-m)z_i$	O(1) per class, for large $K$
Agg-Var clustering	Online thresholded update, create-or-merge	Variable cluster sizes
Hyperbolic centroid schemes	Recursive Möbius addition, tree-based aggregation	Non-Euclidean structure
Centroid attention (transformer)	Single-step gradient on clustering loss	O(NM) vs O(N²), abstraction
Mask-based centroiding	Weighted pixel average in object mask	Semantically grounded
U-centroid for uncertainty	Induced random variable, closed-form for $J(C)$	Captures mean & variance

Each procedure can be made efficiently scalable (e.g., online centroid tracking (Ayub et al., 2019), KNN-masked attention (Wu et al., 2021), adaptive merging in incremental learning (Fukuda et al., 2024), descriptor-graphs for fast partial matching (Tang et al., 2016)).

5. Theoretical Properties, Advantages, and Limitations

Interpretablility: Centroid-level representations offer explicitly interpretable geometric or semantic prototypes, enabling direct model inspection, cluster merging, or semantic editing (Ayub et al., 2019, Lei et al., 18 Nov 2025).
Scalability and Efficiency: By reducing the number of representative vectors, centroids support efficient storage, retrieval, and inference, critical for large-scale or resource-constrained deployments (Wieczorek et al., 2021, Lei et al., 18 Nov 2025).
Robustness: Aggregation mitigates outlier influence, label noise, and internal variance, supporting improved accuracy, especially in challenging scenarios such as low-sample tail classes, major modality shift, or partial data (Tiong et al., 2021, Kim et al., 2024, Tang et al., 2016).
Limitations: Single centroids can obscure intra-class multimodality; centroid-based metrics in non-Euclidean or highly non-linear manifolds may misalign with actual task-relevant similarity (Ayub et al., 2019, Gerek et al., 2022). Threshold and parameter selection often remains empirical, and heuristic assignment rules (e.g., voting, reciprocal distance) are not learned end-to-end.

6. Applications and Empirical Impact

Centroid-level representations have demonstrated state-of-the-art (SOTA) or substantially improved empirical results in:

Image retrieval and re-identification: Reducing search space and boosting mAP and top-1 accuracy over instance-based approaches (Wieczorek et al., 2021).
Long-tailed and low-shot recognition: Dramatic tail-class accuracy improvements due to centroid-driven contrastive training (Tiong et al., 2021).
Class-incremental learning (CIL): Preserving prototype alignment via affine centroid mapping yields both superior accuracy and $O(1)$ inference time, exceeding SOTA methods under scalability constraints (Fukuda et al., 2024).
Efficient LVLM inference: CORE, under object-centric token merging with centroid-based sorting, achieves SOTA compression with minimal drop in task performance, and maintains spatial semantics at token-level (Lei et al., 18 Nov 2025).
Human analysis, 3D vision, and geometric computing: VISUALCENT and SGC produce robust, efficiently computed centroid-level signatures enabling real-time segmentation and partial shape matching, respectively (Ahmad et al., 26 Apr 2025, Tang et al., 2016).
Non-Euclidean text classification: Hyperbolic centroid schemes, especially LBC and LAC, can match or exceed Euclidean baselines in hierarchical datasets, under appropriately constructed embeddings (Gerek et al., 2022).

7. Extensions, Open Questions, and Future Perspectives

There is continued development in:

Centroid schemes for non-Euclidean/complex manifolds: Advancing Riemannian mean computation (Fréchet/Karcher means), optimal transport barycenters, and centroids for graphs or structured objects (Gerek et al., 2022).
Multimodality and hierarchical structures: Multi-centroid or sub-class prototyping for classes with rich internal structure, and centroids for hierarchical aggregation (Wieczorek et al., 2021, Ayub et al., 2019).
Adaptive centroid construction: Online, distributed, and adaptive centroid updates for continual and federated learning scenarios (Fukuda et al., 2024).
Integrating centroids with deep architecture compression or token pruning: Using centroid-guided abstraction within transformers and LVLMs to simultaneously optimize efficiency and semantic retention (Lei et al., 18 Nov 2025, Wu et al., 2021).
Task-specific centroid measures: Aligning centroid distances with non-linear task constraints (e.g., complex geometry, uncertainty) and exploring theory for centroid-induced isoperimetric, generalization, and robustness properties (Besau et al., 2019, Gullo et al., 2012).

The explicit use of centroid-level representations—spanning clustering, prototypical learning, transformer bottlenecking, and geometric aggregation—constitutes a unifying and versatile paradigm with demonstrated impact across contemporary AI research domains.

Markdown Upgrade to Chat

References (13)

On the Unreasonable Effectiveness of Centroids in Image Retrieval (2021)

Centroid Transformers: Learning to Abstract with Attention (2021)

Uncertain Centroid based Partitional Clustering of Uncertain Data (2012)

Hyperbolic Centroid Calculations for Text Classification (2022)

CORE: Compact Object-centric REpresentations as a New Paradigm for Token Merging in LVLMs (2025)

An Elastic Energy Minimization Framework for Mean Contour Calculation (2019)

Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning (2024)

Improving Tail-Class Representation with Centroid Contrastive Learning (2021)

Centroid Based Concept Learning for RGB-D Indoor Scene Classification (2019)

10.

One-Class Learning with Adaptive Centroid Shift for Audio Deepfake Detection (2024)

11.

VISUALCENT: Visual Human Analysis using Dynamic Centroid Representation (2025)

12.

Signature of Geometric Centroids for 3D Local Shape Description and Partial Shape Matching (2016)

13.

Spherical centroid bodies (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Centroid-Level Representations.