Prototype-Oriented Clustering with Distillation (PCD)
- The paper introduces PCD, which replaces single mean prototypes with multi-prototypes via conditional hierarchical agglomerative clustering to capture intra-class diversity in non-IID federated settings.
- The methodology integrates prototype alignment, self-knowledge distillation, and an attractive–repulsive LEMGP loss to ensure global-local consistency and robust model training.
- Experimental results demonstrate that PCD significantly enhances accuracy and reduces error metrics compared to state-of-the-art baselines in AI-RAN enabled MEC environments.
Prototype-Oriented Clustering with Distillation (PCD) is an advanced federated optimization methodology designed to address the challenges of label skew and data heterogeneity in federated learning (FL) settings, particularly within AI-native Radio Access Network (AI-RAN) enabled Multi-Access Edge Computing (MEC) systems. PCD as realized in the Multi-Prototype-Guided Federated Knowledge Distillation (MP-FedKD) framework leverages conditional hierarchical agglomerative clustering (CHAC), multi-prototype representations, and multi-component prototype alignment losses to enable robust, communication-efficient, and high-performance distributed model training in environments with non-independent and identically distributed (non-IID) data (Zou et al., 10 Mar 2026).
1. Motivation and Federated Context
In AI-RAN-enabled MEC, edge devices jointly train a global model without sharing raw data, making FL an attractive paradigm. Traditional FL approaches relying on single mean prototypes per class fail to adequately capture intra-class variability and can lead to information loss, especially under non-IID data distributions. PCD proposes to replace the single mean-prototype per class with a small set of cluster centroids. These multi-prototypes, obtained via CHAC, enhance the fidelity of local representations and their alignment with global prototypes, enabling superior mitigation of heterogeneity-induced performance degradation (Zou et al., 10 Mar 2026).
2. Conditional Hierarchical Agglomerative Clustering (CHAC)
For each client and class label , the representation network maps input data to local class-specific embeddings
To construct a set of representative prototypes per class, the client performs CHAC, initialized with each embedding as its own cluster. Using Ward’s linkage criterion, the algorithm iteratively merges pairs of clusters with minimal increase in within-cluster sum of squares (SSQ): where are cluster sizes and is the mean along dimension 0. Merging continues until 1 clusters remain, after which cluster centroids 2 are extracted as the local class prototypes (Zou et al., 10 Mar 2026).
3. Prototype Alignment and Aggregation
Prototypes 3 from each client are communicated to the central server, which aggregates them to form global class prototypes 4 using a weighted average: 5 Alignment of local and global prototypes is quantitatively enforced by minimizing the Euclidean distance between local embeddings (evaluated using the previous-round representation) and global class prototypes, as codified by the Prototype Alignment (PA) loss (Zou et al., 10 Mar 2026).
4. Knowledge Distillation and Local Objective Design
PCD integrates multiple loss components in the federated optimization loop. The total local objective for client 6 at round 7 is given by
8
with the following components:
- Cross-Entropy Loss (9): Standard supervised criterion.
- Self-Knowledge-Distillation (SKD) Loss (0): KL divergence between current and previous local model logits, smoothed by temperature 1.
- Prototype Alignment Loss (2): Quadratic penalty enforcing local embedding proximity to global prototypes.
- LEMGP Loss (3): An "attractive–repulsive" objective comprised of an intraclass attraction term and an interclass repulsion term, promoting compactness and discriminability (Zou et al., 10 Mar 2026).
Hyperparameters controlling the loss mixture are typically set to 4; initial training rounds use only 5.
5. End-to-End Workflow and Implementation Pseudocode
The federated procedure is organized as an alternating sequence of server and client operations, summarized below:
| Step | Description | Reference |
|---|---|---|
| Initialization | Global model 6, global prototypes 7 | |
| Server broadcast | 8 to selected clients | |
| Client local update | Calculation of 9, 0, CHAC clustering, 1, LEMGP loss | MP-FedKD workflow; eqs. (5,9,10–12) |
| Server aggregation | FedAvg for 2; prototype update via weighted sum (eq. 4) |
The full procedural logic, including conditional CHAC, is specified in the pseudocode block in the source (Zou et al., 10 Mar 2026).
6. Hyperparameters and Design Insights
Key hyperparameters and their standard settings include:
- 3 (max clusters per class).
- 4 (distillation temperature).
- 5 (LEMGP attractive/repulsive weighting).
- Learning rate 6, batch size 32, local epochs 5, rounds 50.
By design, PCD enables modeling intra-class diversity through multi-prototypes, stabilizes local updates on non-IID data by self-distillation, and ensures global-local consistency with prototype alignment and attraction–repulsion mechanisms. This approach provides robustness to data heterogeneity while maintaining communication efficiency, as only prototype vectors and model parameters are transmitted (Zou et al., 10 Mar 2026).
7. Performance Implications and Practical Significance
Experimental results on benchmark datasets under various non-IID conditions demonstrate that the MP-FedKD framework incorporating PCD consistently achieves higher accuracy, average accuracy, and lower RMSE/MAE compared to state-of-the-art baselines. Utilizing CHAC-based multi-prototype construction mitigates information loss characteristic of mean-prototype strategies and preserves intra-class manifold structure. The combination of self-knowledge distillation, PA, and LEMGP losses fosters an optimization dynamic that is less susceptible to the adverse effects of data skew and local overfitting, suggesting broad utility for federated deployments in real-world AI-RAN and MEC scenarios (Zou et al., 10 Mar 2026).