Core Stability in Non-Centroid Clustering
- The paper introduces a formal framework defining the α-core using max-loss objectives to quantify cluster robustness in non-centroid settings.
- It demonstrates impossibility results, showing that no k-clustering can achieve the 1-core under certain conditions, and provides tight α-bound analyses.
- The study proposes algorithmic relaxations such as Fully Justified Representation and spectral stability methods to enhance practical robustness in clustering.
Core stability in non-centroid clustering refers to the robustness of cluster assignments under potential group deviations, quantified by whether a coalition of agents can jointly improve their losses by switching to a different clustering configuration. In non-centroid clustering, loss is not determined by distance to a representative point (centroid), but by some function of pairwise distances or graph interactions among cluster members. Formal definitions, impossibility results, algorithmic frameworks, and empirical findings demonstrate the complexity and limitations of achieving core stability, especially under the max-loss objective, where cluster assignment is determined by the worst pairwise distance within a cluster.
1. Formal Framework of Core Stability in Non-Centroid Clustering
A finite metric space consists of agents with a symmetric distance function . A non-centroid -clustering defines a partition , where and . The max-loss objective assigns to agent in cluster the loss
$\loss_i(S) = \max_{j \in S} d(i, j).$
For clustering , we write $\loss_i(\mathcal C) = \loss_i(\mathcal C(i))$, where is the cluster containing .
-blocking coalition: Subset of size -blocks if for every ,
$\alpha \cdot \loss_i(S) < \loss_i(\mathcal C).$
-core: is in the -core if there is no -blocking coalition of size . The $1$-core is referred to as the core (Caragiannis et al., 30 Oct 2024, Bredereck et al., 24 Nov 2025).
2. Impossibility Theorems and Core Emptiness
For and divisible by , there exist metric instances such that no -clustering lies in the -core for any . The construction, detailed in (Bredereck et al., 24 Nov 2025), uses specially structured coalitions achieving distinct internal max-losses:
- $\loss_i(S_1) = 2^{1/5}$
- $\loss_i(S_2) = 2^{2/5}$
- $\loss_i(S_3) \in \{1,2\}$
- $\loss_i(S_4) = 2^{4/5}$
- $\loss_i(S_5) = 2^{3/5}$
Any clustering forces at least one such coalition to strictly improve by factor , rendering the core empty. The bound is tight: at , cluster assignments can precisely meet all coalition thresholds.
A computer-assisted construction for 2D Euclidean space (, ) yields a lower bound where no core clustering exists. This demonstrates that the core (1-core) can be empty for non-centroid, max-loss clustering—a result previously unresolved (Bredereck et al., 24 Nov 2025).
3. Algorithmic Relaxations: FJR and Approximate Cores
Given the restrictive nature and possible emptiness of the core, Fully Justified Representation (FJR) provides a relaxation. A clustering is in the -FJR if no coalition of size can reduce everyone's loss below $\min_{j \in S} \loss_j(C(j))$ by a factor more than :
$\forall i \in S,\, \alpha \cdot \loss_i(S) < \min_{j \in S} \loss_j(C(j))$
Algorithmically, the GreedyCohesiveClustering framework (using exact or approximate oracle for cohesive cluster selection) constructs clusterings satisfying FJR precisely or up to a constant factor. The efficient GreedyCapture algorithm achieves:
- $2$-core (max-loss), $4$-FJR (average loss), running in time
- Core and FJR violation metrics computed in practice reveal GreedyCapture is consistently fairer than -means++ or -medoids, sacrificing only moderate increases in standard clustering cost (Caragiannis et al., 30 Oct 2024).
| Algorithm | Objective | Core/FJR Guarantee | Runtime |
|---|---|---|---|
| GreedyCohesiveClustering | Arbitrary loss | Exact FJR (factor-) | Inefficient |
| GreedyCapture | Average/max-loss | -core (average), $2$-core (max-loss) |
4. Statistical and Probabilistic Core Notions
Beyond worst-case coalition deviations, statistical core stability quantifies the robustness of cluster membership under stochasticity. For sample-dependent clustering methods (hierarchical, density-based, spectral), core clusters are the largest subsets where every pair co-occurs in the same cluster with probability :
Estimating via bootstrapping and finding core clusters reduces to max-clique identification in a co-occurrence graph. Empirical results indicate non-centroid methods (e.g., hierarchical linkage) tend to have smaller and less pure cores than centroid-based methods, emphasizing the instability of assignments near cluster boundaries (Henelius et al., 2016).
5. Stability in Graph-Based Non-Centroid Clustering
Maximum-likelihood mixture models for graphs (e.g., the NL-EM model) enable a node-centric notion of core stability. Stabilizer nodes are those whose connection patterns strictly exclude membership in all but one group for their neighbors, making the classification "crisp." Extraction involves solving set-cover instances among neighbor excluded-connection sets. The abundance and redundancy of stabilizers corresponds to resilience of the classification under structural perturbations and noise. Real-world examples (U.S. Senate co-voting, food webs) identify stabilizers as information-rich backbones reflective of core community structure (0809.1398).
6. Spectral Stability: Core Assessment via Eigenvalue Gaps
For spectral clustering, stability is assessed via the -th spectral gap and the structured distance to ambiguity . The latter is the minimum Laplacian perturbation under admissible changes such that the -th gap collapses. A two-level iterative algorithm, combining constrained gradient flow (inner) and root-finding (outer), computes robustly in sparse graphs. Structured stability indicators can yield different optimal cluster numbers compared to unstructured spectral gaps, especially for real data with ambiguous community separation (Andreotti et al., 2019).
| Stability Indicator | Definition | Use Case |
|---|---|---|
| Spectral gap | Rapid practical screening | |
| Structured distance | Min. Frobenius norm for vanishing gap | Robustness assessment |
7. Research Directions and Open Problems
- The exact -core often fails to exist under max-loss; $2$-core existence is currently the best general guarantee. Structured metric properties or alternative losses (sum-loss, -loss) might admit stronger results but remain open.
- Efficient auditing procedures enable estimation of FJR violation; analogous constant-factor core audits remain an open technical challenge.
- Statistical core notions allow fine-grained stability assessments, but scale is limited by max-clique complexity and bootstrap instance size.
- Extensions to richer and non-pairwise loss functions, adaptive choice of , and coalition-formation models may yield further insights into tractable core stability for non-centroid clustering (Bredereck et al., 24 Nov 2025, Caragiannis et al., 30 Oct 2024, Henelius et al., 2016, 0809.1398, Andreotti et al., 2019).
A plausible implication is that the conceptual shift from exact core stability to relaxations (FJR, statistical cores, stabilizers, structured spectral stability) provides a necessary framework for achieving practically robust, interpretable, and fair clusterings in complex metric and graph-based domains.