Prototype-Oriented Clustering with Distillation (PCD)

Updated 5 April 2026

The paper introduces PCD, which replaces single mean prototypes with multi-prototypes via conditional hierarchical agglomerative clustering to capture intra-class diversity in non-IID federated settings.
The methodology integrates prototype alignment, self-knowledge distillation, and an attractive–repulsive LEMGP loss to ensure global-local consistency and robust model training.
Experimental results demonstrate that PCD significantly enhances accuracy and reduces error metrics compared to state-of-the-art baselines in AI-RAN enabled MEC environments.

Prototype-Oriented Clustering with Distillation (PCD) is an advanced federated optimization methodology designed to address the challenges of label skew and data heterogeneity in federated learning (FL) settings, particularly within AI-native Radio Access Network (AI-RAN) enabled Multi-Access Edge Computing (MEC) systems. PCD as realized in the Multi-Prototype-Guided Federated Knowledge Distillation (MP-FedKD) framework leverages conditional hierarchical agglomerative clustering (CHAC), multi-prototype representations, and multi-component prototype alignment losses to enable robust, communication-efficient, and high-performance distributed model training in environments with non-independent and identically distributed (non-IID) data (Zou et al., 10 Mar 2026).

1. Motivation and Federated Context

In AI-RAN-enabled MEC, edge devices jointly train a global model without sharing raw data, making FL an attractive paradigm. Traditional FL approaches relying on single mean prototypes per class fail to adequately capture intra-class variability and can lead to information loss, especially under non-IID data distributions. PCD proposes to replace the single mean-prototype per class with a small set of cluster centroids. These multi-prototypes, obtained via CHAC, enhance the fidelity of local representations and their alignment with global prototypes, enabling superior mitigation of heterogeneity-induced performance degradation (Zou et al., 10 Mar 2026).

2. Conditional Hierarchical Agglomerative Clustering (CHAC)

For each client $m$ and class label $c$ , the representation network $\Upsilon(\cdot)$ maps input data $x_{m}^{n}$ to local class-specific embeddings

$\varrho_{m,c}^n = \Upsilon(x_m^n; y_m^n=c).$

To construct a set of $\zeta_m^c$ representative prototypes per class, the client performs CHAC, initialized with each embedding as its own cluster. Using Ward’s linkage criterion, the algorithm iteratively merges pairs of clusters $(B_1, B_2)$ with minimal increase in within-cluster sum of squares (SSQ): $\Delta SSQ_{B_1,B_2} = \frac{v_1 v_2}{v_1 + v_2} \sum_{q=1}^Q (\bar E_{B_1}^q - \bar E_{B_2}^q)^2,$ where $v_1, v_2$ are cluster sizes and $\bar E_{B_z}^q$ is the mean along dimension $c$ 0. Merging continues until $c$ 1 clusters remain, after which cluster centroids $c$ 2 are extracted as the local class prototypes (Zou et al., 10 Mar 2026).

3. Prototype Alignment and Aggregation

Prototypes $c$ 3 from each client are communicated to the central server, which aggregates them to form global class prototypes $c$ 4 using a weighted average: $c$ 5 Alignment of local and global prototypes is quantitatively enforced by minimizing the Euclidean distance between local embeddings (evaluated using the previous-round representation) and global class prototypes, as codified by the Prototype Alignment (PA) loss (Zou et al., 10 Mar 2026).

4. Knowledge Distillation and Local Objective Design

PCD integrates multiple loss components in the federated optimization loop. The total local objective for client $c$ 6 at round $c$ 7 is given by

$c$ 8

with the following components:

Cross-Entropy Loss ( $c$ 9): Standard supervised criterion.
Self-Knowledge-Distillation (SKD) Loss ( $\Upsilon(\cdot)$ 0): KL divergence between current and previous local model logits, smoothed by temperature $\Upsilon(\cdot)$ 1.
Prototype Alignment Loss ( $\Upsilon(\cdot)$ 2): Quadratic penalty enforcing local embedding proximity to global prototypes.
LEMGP Loss ( $\Upsilon(\cdot)$ 3): An "attractive–repulsive" objective comprised of an intraclass attraction term and an interclass repulsion term, promoting compactness and discriminability (Zou et al., 10 Mar 2026).

Hyperparameters controlling the loss mixture are typically set to $\Upsilon(\cdot)$ 4; initial training rounds use only $\Upsilon(\cdot)$ 5.

5. End-to-End Workflow and Implementation Pseudocode

The federated procedure is organized as an alternating sequence of server and client operations, summarized below:

Step	Description	Reference
Initialization	Global model $\Upsilon(\cdot)$ 6, global prototypes $\Upsilon(\cdot)$ 7
Server broadcast	$\Upsilon(\cdot)$ 8 to selected clients
Client local update	Calculation of $\Upsilon(\cdot)$ 9, $x_{m}^{n}$ 0, CHAC clustering, $x_{m}^{n}$ 1, LEMGP loss	MP-FedKD workflow; eqs. (5,9,10–12)
Server aggregation	FedAvg for $x_{m}^{n}$ 2; prototype update via weighted sum (eq. 4)

The full procedural logic, including conditional CHAC, is specified in the pseudocode block in the source (Zou et al., 10 Mar 2026).

6. Hyperparameters and Design Insights

Key hyperparameters and their standard settings include:

$x_{m}^{n}$ 3 (max clusters per class).
$x_{m}^{n}$ 4 (distillation temperature).
$x_{m}^{n}$ 5 (LEMGP attractive/repulsive weighting).
Learning rate $x_{m}^{n}$ 6, batch size 32, local epochs 5, rounds 50.

By design, PCD enables modeling intra-class diversity through multi-prototypes, stabilizes local updates on non-IID data by self-distillation, and ensures global-local consistency with prototype alignment and attraction–repulsion mechanisms. This approach provides robustness to data heterogeneity while maintaining communication efficiency, as only prototype vectors and model parameters are transmitted (Zou et al., 10 Mar 2026).

7. Performance Implications and Practical Significance

Experimental results on benchmark datasets under various non-IID conditions demonstrate that the MP-FedKD framework incorporating PCD consistently achieves higher accuracy, average accuracy, and lower RMSE/MAE compared to state-of-the-art baselines. Utilizing CHAC-based multi-prototype construction mitigates information loss characteristic of mean-prototype strategies and preserves intra-class manifold structure. The combination of self-knowledge distillation, PA, and LEMGP losses fosters an optimization dynamic that is less susceptible to the adverse effects of data skew and local overfitting, suggesting broad utility for federated deployments in real-world AI-RAN and MEC scenarios (Zou et al., 10 Mar 2026).

Markdown Report Issue Upgrade to Chat

References (1)

A Multi-Prototype-Guided Federated Knowledge Distillation Approach in AI-RAN Enabled Multi-Access Edge Computing System (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Prototype-Oriented Clustering with Distillation (PCD).

Prototype-Oriented Clustering with Distillation (PCD)

1. Motivation and Federated Context

2. Conditional Hierarchical Agglomerative Clustering (CHAC)

3. Prototype Alignment and Aggregation

4. Knowledge Distillation and Local Objective Design

5. End-to-End Workflow and Implementation Pseudocode

6. Hyperparameters and Design Insights

7. Performance Implications and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Prototype-Oriented Clustering with Distillation (PCD)

1. Motivation and Federated Context

2. Conditional Hierarchical Agglomerative Clustering (CHAC)

3. Prototype Alignment and Aggregation

4. Knowledge Distillation and Local Objective Design

5. End-to-End Workflow and Implementation Pseudocode

6. Hyperparameters and Design Insights

7. Performance Implications and Practical Significance

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research