Papers
Topics
Authors
Recent
Search
2000 character limit reached

Prototype-Oriented Clustering with Distillation (PCD)

Updated 5 April 2026
  • The paper introduces PCD, which replaces single mean prototypes with multi-prototypes via conditional hierarchical agglomerative clustering to capture intra-class diversity in non-IID federated settings.
  • The methodology integrates prototype alignment, self-knowledge distillation, and an attractive–repulsive LEMGP loss to ensure global-local consistency and robust model training.
  • Experimental results demonstrate that PCD significantly enhances accuracy and reduces error metrics compared to state-of-the-art baselines in AI-RAN enabled MEC environments.

Prototype-Oriented Clustering with Distillation (PCD) is an advanced federated optimization methodology designed to address the challenges of label skew and data heterogeneity in federated learning (FL) settings, particularly within AI-native Radio Access Network (AI-RAN) enabled Multi-Access Edge Computing (MEC) systems. PCD as realized in the Multi-Prototype-Guided Federated Knowledge Distillation (MP-FedKD) framework leverages conditional hierarchical agglomerative clustering (CHAC), multi-prototype representations, and multi-component prototype alignment losses to enable robust, communication-efficient, and high-performance distributed model training in environments with non-independent and identically distributed (non-IID) data (Zou et al., 10 Mar 2026).

1. Motivation and Federated Context

In AI-RAN-enabled MEC, edge devices jointly train a global model without sharing raw data, making FL an attractive paradigm. Traditional FL approaches relying on single mean prototypes per class fail to adequately capture intra-class variability and can lead to information loss, especially under non-IID data distributions. PCD proposes to replace the single mean-prototype per class with a small set of cluster centroids. These multi-prototypes, obtained via CHAC, enhance the fidelity of local representations and their alignment with global prototypes, enabling superior mitigation of heterogeneity-induced performance degradation (Zou et al., 10 Mar 2026).

2. Conditional Hierarchical Agglomerative Clustering (CHAC)

For each client mm and class label cc, the representation network Υ(⋅)\Upsilon(\cdot) maps input data xmnx_{m}^{n} to local class-specific embeddings

ϱm,cn=Υ(xmn;ymn=c).\varrho_{m,c}^n = \Upsilon(x_m^n; y_m^n=c).

To construct a set of ζmc\zeta_m^c representative prototypes per class, the client performs CHAC, initialized with each embedding as its own cluster. Using Ward’s linkage criterion, the algorithm iteratively merges pairs of clusters (B1,B2)(B_1, B_2) with minimal increase in within-cluster sum of squares (SSQ): ΔSSQB1,B2=v1v2v1+v2∑q=1Q(EˉB1q−EˉB2q)2,\Delta SSQ_{B_1,B_2} = \frac{v_1 v_2}{v_1 + v_2} \sum_{q=1}^Q (\bar E_{B_1}^q - \bar E_{B_2}^q)^2, where v1,v2v_1, v_2 are cluster sizes and EˉBzq\bar E_{B_z}^q is the mean along dimension cc0. Merging continues until cc1 clusters remain, after which cluster centroids cc2 are extracted as the local class prototypes (Zou et al., 10 Mar 2026).

3. Prototype Alignment and Aggregation

Prototypes cc3 from each client are communicated to the central server, which aggregates them to form global class prototypes cc4 using a weighted average: cc5 Alignment of local and global prototypes is quantitatively enforced by minimizing the Euclidean distance between local embeddings (evaluated using the previous-round representation) and global class prototypes, as codified by the Prototype Alignment (PA) loss (Zou et al., 10 Mar 2026).

4. Knowledge Distillation and Local Objective Design

PCD integrates multiple loss components in the federated optimization loop. The total local objective for client cc6 at round cc7 is given by

cc8

with the following components:

  • Cross-Entropy Loss (cc9): Standard supervised criterion.
  • Self-Knowledge-Distillation (SKD) Loss (Î¥(â‹…)\Upsilon(\cdot)0): KL divergence between current and previous local model logits, smoothed by temperature Î¥(â‹…)\Upsilon(\cdot)1.
  • Prototype Alignment Loss (Î¥(â‹…)\Upsilon(\cdot)2): Quadratic penalty enforcing local embedding proximity to global prototypes.
  • LEMGP Loss (Î¥(â‹…)\Upsilon(\cdot)3): An "attractive–repulsive" objective comprised of an intraclass attraction term and an interclass repulsion term, promoting compactness and discriminability (Zou et al., 10 Mar 2026).

Hyperparameters controlling the loss mixture are typically set to Υ(⋅)\Upsilon(\cdot)4; initial training rounds use only Υ(⋅)\Upsilon(\cdot)5.

5. End-to-End Workflow and Implementation Pseudocode

The federated procedure is organized as an alternating sequence of server and client operations, summarized below:

Step Description Reference
Initialization Global model Υ(⋅)\Upsilon(\cdot)6, global prototypes Υ(⋅)\Upsilon(\cdot)7
Server broadcast Υ(⋅)\Upsilon(\cdot)8 to selected clients
Client local update Calculation of Υ(⋅)\Upsilon(\cdot)9, xmnx_{m}^{n}0, CHAC clustering, xmnx_{m}^{n}1, LEMGP loss MP-FedKD workflow; eqs. (5,9,10–12)
Server aggregation FedAvg for xmnx_{m}^{n}2; prototype update via weighted sum (eq. 4)

The full procedural logic, including conditional CHAC, is specified in the pseudocode block in the source (Zou et al., 10 Mar 2026).

6. Hyperparameters and Design Insights

Key hyperparameters and their standard settings include:

  • xmnx_{m}^{n}3 (max clusters per class).
  • xmnx_{m}^{n}4 (distillation temperature).
  • xmnx_{m}^{n}5 (LEMGP attractive/repulsive weighting).
  • Learning rate xmnx_{m}^{n}6, batch size 32, local epochs 5, rounds 50.

By design, PCD enables modeling intra-class diversity through multi-prototypes, stabilizes local updates on non-IID data by self-distillation, and ensures global-local consistency with prototype alignment and attraction–repulsion mechanisms. This approach provides robustness to data heterogeneity while maintaining communication efficiency, as only prototype vectors and model parameters are transmitted (Zou et al., 10 Mar 2026).

7. Performance Implications and Practical Significance

Experimental results on benchmark datasets under various non-IID conditions demonstrate that the MP-FedKD framework incorporating PCD consistently achieves higher accuracy, average accuracy, and lower RMSE/MAE compared to state-of-the-art baselines. Utilizing CHAC-based multi-prototype construction mitigates information loss characteristic of mean-prototype strategies and preserves intra-class manifold structure. The combination of self-knowledge distillation, PA, and LEMGP losses fosters an optimization dynamic that is less susceptible to the adverse effects of data skew and local overfitting, suggesting broad utility for federated deployments in real-world AI-RAN and MEC scenarios (Zou et al., 10 Mar 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prototype-Oriented Clustering with Distillation (PCD).