Multi-Granular Competitive Penalization Learning

Updated 30 January 2026

The paper introduces MGCPL, a penalty-augmented competitive learning approach that simultaneously discovers coarse and fine-grained clusters.
MGCPL adaptively reduces prototype counts stage-wise, effectively capturing the inherent nested structure in complex categorical data.
The integrated CAME encoding produces robust multi-granular embeddings with linear scalability, outperforming traditional clustering methods.

Multi-Granular Competitive Penalization Learning (MGCPL) addresses the inherent complexity of clustering categorical data, where features are qualitative and the natural cluster structure is frequently nested and multi-granular. Standard partitional clustering methods (such as k-modes) require the number of clusters to be specified a priori and cannot automatically capture such nesting, while hierarchical methods suffer from computational inefficiency and lack of adaptive metric learning. MGCPL introduces a competitive learning paradigm augmented with penalization mechanisms that allow the simultaneous discovery of coarse- and fine-grained clusterings, automatically identifying a sequence of partitions at progressively reduced cardinalities. In conjunction with MGCPL, the Cluster Aggregation based on MGCPL Encoding (CAME) technique provides final, robust cluster assignments through the construction of a multi-granular embedding. Together, these methods comprise a pipeline—the MGCPL-guided Categorical Data Clustering (MCDC) approach—that delivers data-driven, stage-wise cluster discovery and scalable performance for large categorical datasets (Cai et al., 23 Jan 2026).

1. Multi-Grained Structure of Categorical Data

Categorical datasets commonly exhibit a nested multi-granular clustering effect, whereby compact clusters composed of overlapping objects can themselves be grouped into coarser clusters. This structure arises because categorical values delimit discrete and typically non-Euclidean similarity spaces, where objects may share attribute values with multiple groups, resulting in extensive overlap in feature subspaces. Standard partitional methods, such as k-modes, require explicit specification of k and treat each clustering granularity separately, whereas hierarchical approaches, although capable of expressing nesting, lack adaptive metric learning and are often computationally intractable for large-scale or high-dimensional data (Cai et al., 23 Jan 2026). MGCPL is explicitly designed to model, and efficiently identify, this multi-granular structure without the need for user-specified granularity.

2. MGCPL Algorithm: Formulation and Update Mechanics

MGCPL builds upon the classical competitive learning criterion, maximizing a within-cluster similarity objective formulated as

$S(Q,U) = \sum_{l=1}^k \sum_{i=1}^n u_l q_{il} s(x_i, C_l),$

where $q_{il} \in \{0,1\}$ indicates assignment of object $x_i$ to cluster $C_l$ , $u_l \in [0,1]$ reflects cluster "prominence," and $s(x_i, C_l)$ denotes categorical similarity. To mitigate the winner-take-all effect and encourage competition, MGCPL introduces a rival-penalization term, leading to the objective

$\mathcal{L}(Q, U) = \sum_{i=1}^n \left[ u_{v(i)} s(x_i, C_{v(i)}) - \lambda u_{h(i)} s(x_i, C_{h(i)}) \right],$

where for each $x_i$ , the winner $v(i)$ and nearest rival $h(i)$ are determined by a similarity-weighted selection, penalized by normalized win counts $\rho_l$ and controlled by parameter $\lambda$ . Prototype updates are implemented through energy variables $\delta_l$ and prominence weights $u_l = \sigma(\delta_l)$ (with sigmoid re-parameterization), while cluster prototypes $C_l$ are represented implicitly by feature-value frequency tables. At each step, both winner and rival prototypes are updated: $\delta_v \leftarrow \delta_v + \eta$ , $\delta_h \leftarrow \delta_h - \eta s(x_i, C_h)$ , and the cluster assignment statistics are correspondingly adjusted (Cai et al., 23 Jan 2026).

3. Stage-Wise Reduction and Discovery of Granularity

The MGCPL algorithm operates in an epochal, stage-wise manner. At each epoch $j$ , clustering proceeds with $k^{(j-1)}$ prototypes (with an initial $k_0 \gg k^*$ ). Upon assignment convergence (no change in $q_{il}$ ), prototypes with zero winners ("dead" prototypes) are pruned, providing $k^{(j)} < k^{(j-1)}$ . State variables ( $g_l$ , $\delta_l$ , $u_l$ ) are reinitialized, and the process restarts with survivors. This continues until $k^{(j)}=k^{(j-1)}$ , yielding a sequence of cluster cardinalities $\kappa = \{k_1, k_2, ..., k_\sigma\}$ , where each stage's partition $Y_j$ forms part of the multi-granular label set $\Gamma = \{Y_1, ..., Y_\sigma\}$ . The result is an ordered sequence of data-driven partitions at distinct levels of granularity, naturally converging to a number of clusters $k^*$ without prior specification (Cai et al., 23 Jan 2026).

4. Cluster Aggregation via MGCPL Encoding (CAME)

CAME constructs a categorical embedding for each object $i$ by concatenating its label assignments across stages, $\Gamma_i = [y_{1i}, y_{2i}, ..., y_{\sigma i}]$ . Final cluster partitioning operates on these embeddings with a weighted k-modes objective:

$P(Q,\Theta) = \sum_{l=1}^k \sum_{i=1}^n \sum_{r=1}^\sigma q_{il} \theta_r d(\Gamma_{ir}, Z_{lr}),$

where $\Theta = [\theta_1, ..., \theta_\sigma]$ is the vector of feature weights ( $\sum \theta_r = 1$ ), $Z_{lr}$ is the mode of the $r^\mathrm{th}$ label in cluster $l$ , and $d(\cdot, \cdot)$ is the 0–1 distance. Optimization alternates between object assignment—minimizing the weighted Hamming distance for fixed $\Theta$ —and feature weight update using intra-cluster similarity, ensuring convergence in a finite number of steps. CAME integrates the multi-granular information derived from MGCPL into a low-dimensional, robust categorical representation for final clustering (Cai et al., 23 Jan 2026).

5. Computational Complexity and Scalability

The full MCDC workflow, combining MGCPL and CAME, maintains linear time complexity with respect to key dataset parameters. Single-epoch MGCPL costs $O(d n k)$ per iteration, and over $\sigma$ stages and $I$ iterations, the overall complexity is $O(\sigma I d n k_0)$ with practical $\sigma, I \ll n, d, k_0$ . CAME's alternating minimization incurs $O(\sigma n k)$ per assignment and weight update step, so across $T$ alternations, the cost is $O(T \sigma n k) \approx O(n k)$ . This scalability facilitates the application of MGCPL-based clustering to large and high-dimensional categorical datasets, supporting pre-partitioning for distributed computing (Cai et al., 23 Jan 2026).

6. Theoretical Properties and Empirical Performance

MGCPL is a form of penalized competitive learning that can be interpreted as a gradient-descent process on $\mathcal{L}(Q, U)$ , guaranteeing stabilization of label assignments. CAME, as a weighted k-modes variant, converges to a local minimum via alternating minimization. Empirical evaluation across eight real categorical benchmark datasets (including UCI Car, Congress, Chess, Mushroom, Tic-Tac-Toe, Vote, Balance, and Nursery) and large synthetic sets (e.g., $n=200{,}000$ , $d=1000$ ) demonstrates that MCDC achieves superior clustering accuracy (ACC), Adjusted Rand Index (ARI), Adjusted Mutual Information (AMI), and Fowlkes–Mallows (FM) scores relative to state-of-the-art methods such as k-modes, ROCK, WOCIL, FKMAWCW, GUDMM, and ADC. The hybrid variant MCDC+FKMAWCW achieves the highest overall performance. Wilcoxon tests at 90% confidence confirm significant improvements in ACC, ARI, and FM over baselines. Ablation studies indicate that removal of CAME, its feature-weighting, or the penalization in MGCPL degrades performance in succession. Analysis of the $k_j$ sequence shows that MGCPL recovers the true cluster number at the coarsest granularity. Runtime studies corroborate the method's linear scaling (Cai et al., 23 Jan 2026).

7. Implications and Applications

MGCPL, by combining competitive learning with rival penalization and an adaptive, stage-wise reduction of prototypes, provides a fully data-driven approach for uncovering the multi-granular cluster structure characteristic of categorical datasets. The associated CAME aggregation strategy enables these structures to be effectively encoded for downstream clustering tasks, conferring robustness to hyperparameter choices and improved recovery of true clusterings, even in the presence of overlapping subspaces or high-dimensional features. The linear complexity and empirical scalability make the MGCPL–CAME pipeline suitable for distributed data partitioning and large-scale data processing. A plausible implication is that this approach provides an effective alternative to both partitional and traditional hierarchical clustering in settings where the underlying granular structure is unknown or heterogeneous (Cai et al., 23 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Robust Categorical Data Clustering Guided by Multi-Granular Competitive Learning (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-Granular Competitive Penalization Learning (MGCPL).