Semi-Centroid Clustering Methods

Updated 8 January 2026

Semi-centroid clustering is a hybrid approach that integrates centroid-based and pairwise similarity measures, enabling both hard and fuzzy assignments.
It leverages a convex combination of centroid and intra-cluster losses to optimize clustering performance while ensuring fairness and robustness.
Algorithmic frameworks like fuzzy K-means and bridged clustering support semi-supervised, multi-modal representation learning with strong theoretical and empirical guarantees.

Semi-centroid clustering refers to a class of clustering and representation learning paradigms that interpolate between centroid-based clustering (where each cluster is summarized by a prototypical centroid) and non-centroid (centerless) clustering (where the organization and evaluation of clusters rely exclusively on intra-cluster relationships, especially pairwise similarities or distances). Unlike classical centroid-based methods such as $k$ -means, semi-centroid techniques admit hybrid or entirely centroid-free characterizations, allow flexible loss definitions, and support both hard and fuzzy assignments. They provide substantial robustness, interpretability, and fairness guarantees across diverse scenarios, including unsupervised, semi-supervised, and multi-modal representation learning.

1. Formal Definitions and Paradigms

A semi-centroid clustering over a set of $n$ agents $N$ and candidate centers $M$ divides $N$ into $k$ clusters $C_1,\ldots,C_k$ and selects centers $x_1,\ldots,x_k\in M$ . The individual loss for member $i$ in cluster $C_t$ with center $x_t$ is parameterized by a convex combination of centroid and non-centroid terms:

$\ell_\alpha(i;C,x) = \alpha \cdot d^c(i,x) + (1-\alpha) \cdot \max_{j\in C} d^m(i,j)$

for $\alpha\in[0,1]$ , where $d^c$ and $d^m$ are fixed (pseudo)metrics for centroid and maximum intra-cluster loss respectively (Cookson et al., 1 Jan 2026). Setting $\alpha=1$ recovers centroid-based clustering (e.g., $k$ -means, $k$ -medians), and $\alpha=0$ yields non-centroid clustering. Intermediate $\alpha$ yield blended loss functions that interpolate between the two regimes.

Centroid-free fuzzy clustering, as realized by Lu et al. (2024), eliminates the need for explicit centroids by encoding partition structure entirely via a fuzzy assignment matrix $Y\in\mathbb{R}_+^{N\times K}$ and a fixed global distance matrix $D\in\mathbb{R}^{N\times N}$ :

$J(Y) = \mathrm{tr}\left(Y^T D Y P^{-1}\right) + \lambda \|Y\|_F^2$

subject to $Y\mathbf{1}_K=\mathbf{1}_N$ , with $P=\mathrm{diag}(p_{11},\ldots,p_{KK})$ and $p_{\ell\ell}=\sum_{i=1}^N y_{i\ell}$ (Bao et al., 2024). All geometric and cluster structure is transferred from explicit centers to distance-weighted membership statistics.

2. Algorithmic Frameworks for Semi-Centroid Clustering

Centroid-Free Fuzzy K-Means (FKMWC)

Lu et al. introduce a multiplicative update algorithm without explicit centroid maintenance (Bao et al., 2024):

Initialization: Row-normalized $Y\in\mathbb{R}_+^{N\times K}$ .
Main loop:
- Compute $a_\ell = (Y^T D Y)_{\ell\ell}$ .
- Compute $p_{\ell\ell} = \sum_{i=1}^N y_{i\ell}$ .
- Form $G = (D + D^T)Y P^{-1} + 2\lambda Y$ .
- Update $y_{i\ell} \leftarrow y_{i\ell} \sqrt{\dfrac{a_\ell p_{\ell\ell}^{-2}}{G_{i\ell}}}$ .
- Renormalize rows of $Y$ such that $\sum_{\ell=1}^K y_{i\ell} = 1$ .

This approach embeds centroid effects in the trace term $\mathrm{tr}(Y^T D Y P^{-1})$ and outputs only fuzzy memberships.

Core-Approximate Semi-Centroid Clustering

Cookson, Shah, and Yu (2024) develop a polynomial-time 3-core approximate algorithm based on:

Most-Cohesive Cluster (MCC) Extraction: Iteratively constructing tentative clusters by greedy minimization of maximal hybrid loss.
Selective Switching: For each agent, opportunistic transfer between clusters based on potential reduction in loss, using constructed upper bounds on hybrid losses.
Complexity: The algorithm is polynomial in $n$ , $k$ , and $|M|$ , and extensions operate in the dual-metric ( $d^c$ , $d^m$ ) regime (Cookson et al., 1 Jan 2026).

Semi-Supervised Sparse Bridged Clustering

Bridged Clustering (Katz et al. 2025) demonstrates a semi-centroid methodology for sparse alignment across domains:

Step A: Cluster input $X$ and output $Y$ domains independently, producing centroids $\{c_i^X\}$ and $\{c_j^Y\}$ .
Step B: Learn a sparse bridge $B\in\mathbb{R}^{C \times C}$ via

$\min_B \sum_{(x',y')\in S} \|B^T \phi_X(x') - \phi_Y(y')\|_2^2 + \lambda \|B\|_1,$

given $k$ paired samples $S$ and cluster-indicator maps $\phi$ .

Step C: Predict via $x\mapsto$ assigned input cluster $i^*$ , select output cluster $j^* = \arg\max_j |B_{i^*,j}|$ , and output $c_{j^*}^Y$ (Ye et al., 8 Oct 2025).

3. Fairness Criteria and Lower Bounds

Proportional fairness in semi-centroid clustering is formalized via the $\alpha$ -core and $\alpha$ -Fully Justified Representation (FJR):

$\alpha$ -core: No coalition $S$ , $|S|\ge n/k$ , can collectively improve their loss by defecting to a new center $y\in M$ relative to their losses in current clusters.
$\alpha$ -FJR: A coalition $S$ , $|S|\ge n/k$ , cannot simultaneously achieve strictly better loss than the minimum loss within $S$ in the given clustering.

Cookson et al. establish:

Loss Function	Existential Bound ( $\rho^*$ )	Poly-Time Bound ( $\rho_\lambda$ )	Lower Bound
Dual-metric hybrid	3	3 + 2√3	2 (pure centroid)
Weighted single-metric ( $\lambda$ )	min $\{2/\lambda, 3\}$	min $\{2/\lambda, f_\lambda\}$	max $\{g_\lambda, 2(1-\lambda)/(2\lambda+1)\}$

No finite simultaneous core-approximation is possible for arbitrary mixing of centroid/non-centroid or dual-metric losses (Cookson et al., 1 Jan 2026).

4. Theoretical and Empirical Guarantees

FKMWC achieves, on diverse real-world datasets (faces, images, texts), robust performance that matches or exceeds traditional baselines in accuracy (ACC), normalized mutual information (NMI), and purity, with limited sensitivity to initialization and regularization (Bao et al., 2024). For example, on the AR face dataset, ACC improved from $\sim$ 0.25 (K-Means++) to $\sim$ 0.39; on JAFFE, performance with KNN distance reaches $\sim$ 0.97.

Bridged Clustering exhibits high label efficiency: one or two paired samples per cluster suffice to map centroids across modalities with exponentially small mis-bridging error. Overall risk decomposes as

$\mathbb{E}[\|Y-\hat{y}\|^2] \leq D_Y + (\varepsilon_X + \varepsilon_B + \varepsilon_Y)\cdot M$

where $D_Y$ is the within-cluster variance in $Y$ , $M$ is the maximum inter-centroid distance, and $\varepsilon$ terms reflect mis-clustering and mis-bridging rates with explicit exponential bounds under sub-Gaussianity and separation conditions (Ye et al., 8 Oct 2025).

5. Structural Properties, Interpretability, and Use Cases

Semi-centroid and centroid-free methods offer several structural and practical advantages:

Robustness: By eliminating explicit centroid recomputation, algorithms are less sensitive to noise and initialization (Bao et al., 2024).
Flexibility: Choice of distance metric $D$ allows seamless transition to kernel methods, graph-based clustering, and support for non-Euclidean data (Bao et al., 2024).
Fairness and representation: Algorithms enforce proportional representation and defend against coalition improvements, which are essential in societal or democratic allocation settings (Cookson et al., 1 Jan 2026).
Interpretability: Sparse bridge matrices $B$ and cluster-centric assignments facilitate transparent prediction pipelines, in contrast to dense transport-based approaches (Ye et al., 8 Oct 2025).
Applicability in semi-supervision: Techniques such as Bridged Clustering are particularly effective in low-supervision and semi-supervised learning contexts involving unpaired datasets and sparse ground-truth alignments (Ye et al., 8 Oct 2025).

Potential limitations include increased computational and storage costs for fully dense distance matrices ( $\mathcal{O}(TK N^2)$ per iteration), which can be mitigated by sparsification or graph-based approximations (Bao et al., 2024).

6. Connections and Extensions

Semi-centroid clustering generalizes and bridges classical approaches:

In fuzzy clustering, FKMWC extends FCM by encoding cluster prototypes implicitly, showing full equivalence for squared Euclidean distance (Bao et al., 2024).
Semi-centroid fairness algorithms synthesize the centroid and non-centroid paradigms, achieving bounded approximation and representation guarantees even under dual metrics (Cookson et al., 1 Jan 2026).
Sparse-bridged approaches relate to multi-view and cross-modal representation learning, with interpretability and label efficiency advantages (Ye et al., 8 Oct 2025).

This framework admits further generalization to kernelized, graph-based, and constraint-driven clustering domains, supporting the evolving demands for robust, fair, and interpretable unsupervised and semi-supervised data partitioning.

PDF Markdown Chat (Pro)

References (3)

Unifying Proportional Fairness in Centroid and Non-Centroid Clustering (2026)

Fuzzy K-Means Clustering without Cluster Centroids (2024)

Bridged Clustering for Representation Learning: Semi-Supervised Sparse Bridging (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Semi-Centroid Clustering.

Semi-Centroid Clustering Methods

1. Formal Definitions and Paradigms

2. Algorithmic Frameworks for Semi-Centroid Clustering

Centroid-Free Fuzzy K-Means (FKMWC)

Core-Approximate Semi-Centroid Clustering

Semi-Supervised Sparse Bridged Clustering

3. Fairness Criteria and Lower Bounds

4. Theoretical and Empirical Guarantees

5. Structural Properties, Interpretability, and Use Cases

6. Connections and Extensions

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Semi-Centroid Clustering Methods

1. Formal Definitions and Paradigms

2. Algorithmic Frameworks for Semi-Centroid Clustering

Centroid-Free Fuzzy K-Means (FKMWC)

Core-Approximate Semi-Centroid Clustering

Semi-Supervised Sparse Bridged Clustering

3. Fairness Criteria and Lower Bounds

4. Theoretical and Empirical Guarantees

5. Structural Properties, Interpretability, and Use Cases

6. Connections and Extensions

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research