Papers
Topics
Authors
Recent
Search
2000 character limit reached

One-shot Hierarchical Federated Clustering

Updated 17 January 2026
  • The paper presents a one-shot hierarchical FC framework that fuses local fine-partitioning with server-side multi-granular aggregation to overcome federated clustering challenges.
  • It employs Competitive Penalized Learning (FCPL) at clients to discover refined local clusterlets which are aggregated non-iteratively to form coherent global clusters.
  • The framework achieves state-of-the-art performance in purity, ARI, NMI, and ACC while ensuring communication efficiency and robust privacy protection.

A one-shot hierarchical federated clustering (FC) framework addresses the unsupervised extraction of global multi-granular clustering structure from decentralized, privacy-preserving clients, with all communication restricted to a single round. This paradigm is driven by practical demands in large-scale decentralized applications, where global clusters may be split and distributed in fragmented, locally biased, or hierarchically nested ways across heterogeneous clients. By fusing local fine-partitioning with server-side multi-granular aggregation, these frameworks efficiently overcome the computational, statistical, and privacy challenges that undermine conventional hierarchical clustering in federated settings (Cai et al., 10 Jan 2026).

1. Federated Clustering Setting and One-Shot Communication

In the one-shot hierarchical FC setting, LL clients hold private, typically non-IID datasets X(l)∈Rn(l)×dX^{(l)} \in \mathbb{R}^{n^{(l)} \times d} and a central server orchestrates a singular round of prototype-level communication. Each client summarizes its local distribution—not raw samples or gradients—by producing a set of k(l)k^{(l)} clusterlets, each represented as a dd-dimensional centroid cj(l)c_j^{(l)}. These centroids are transmitted once to the server, minimizing privacy exposure and communication load. No subsequent interaction is required, and all federated learning proceeds through these prototype exchanges.

The communication protocol is characterized by:

  • Each client uploading O(k(l)d)O(k^{(l)} d) floats, where k(l)≪n(l)k^{(l)} \ll n^{(l)}.
  • No transmission of raw data or gradients.
  • The possibility of privacy-preserving enhancements such as homomorphic encryption or applying differential privacy to the sent centroids.

2. Fine Partition and Local Clusterlet Discovery (FCPL at Clients)

Local fine-grained distribution exploration is performed at each client via Competitive Penalized Learning (FCPL):

  • Clients initialize k0k_0 candidate clusterlets {Cj(l)}j=1k0\{C_j^{(l)}\}_{j=1}^{k_0} with random centroids.
  • Similarity between a data vector xi(l)x_i^{(l)} and clusterlet Cj(l)C_j^{(l)} is measured via a weighted Euclidean distance that incorporates feature-importance vectors mjm_j:

s(xi(l),Cj(l))=∥mj⊙(xi(l)−cj(l))∥2s(x_i^{(l)}, C_j^{(l)}) = \| m_j \odot (x_i^{(l)} - c_j^{(l)}) \|_2

  • Clusterlet weights wjw_j are dynamically updated through a penalized, competitive scheme tracked by proxies Wj\mathcal{W}_j and relative winning possibility γj\gamma_j.
  • Objects are iteratively assigned to their most competitive clusterlet, with loser clusterlets penalized and redundant ones pruned.
  • The FCPL process converges once object-cluster affiliations stabilize, yielding a reduced set of clusterlets (k(l)≪k0k^{(l)} \ll k_0) and their centroids.

This mechanism enables adaptive, fine-partitioned discovery of potentially fragmented, incomplete, or overlapping local clusters while preventing overfitting or redundancy in local representations.

3. Server-Side Multi-Granular Learning and Hierarchical Aggregation (MCPL)

The server receives all client centroids, stacks them into a global prototype matrix C∈Rn(s)×dC \in \mathbb{R}^{n^{(s)} \times d}, and performs recursive multi-granular competitive penalized learning (MCPL):

  • At each hierarchical level δ\delta, FCPL is applied to CC to cluster into kδk_\delta, generating new centroids and cluster affiliations QδQ_\delta.
  • Centroids are reinitialized at each level to encourage structural diversity and reduce local minima entrapment.
  • Hierarchy stops growing when kδ+1=kδk_{\delta+1} = k_\delta.

To align inconsistent local granularities, hierarchical encoding is performed:

  • For each centroid ii, a Δ\Delta-dimensional enhanced representation xiδ(s)x_{i \delta}^{(s)} is constructed as:

xiδ(s)=∑j=1kδj⋅qij(δ)x_{i \delta}^{(s)} = \sum_{j=1}^{k_\delta} j \cdot q_{ij}^{(\delta)}

  • The holistic feature matrix X(s)∈Rn(s)×ΔX^{(s)} \in \mathbb{R}^{n^{(s)} \times \Delta} then encodes multi-granular membership information for each centroid across the hierarchy.

A final feature-weighted clustering is performed on X(s)X^{(s)} via an alternating maximization of cluster assignments Q(s)Q^{(s)} and a feature-cluster weight matrix U∈Rk∗×ΔU \in \mathbb{R}^{k^* \times \Delta}. Feature weights are derived from inter-cluster separability (via Hellinger distance) and intra-cluster cohesion.

4. Algorithmic Properties and Theoretical Guarantees

The framework exhibits rigorously established properties:

  • Time Complexity: O(Mâ‹…dâ‹…k0â‹…N)O(M \cdot d \cdot k_0 \cdot N), with NN the total sample size across clients, MM the maximum FCPL iterations.
  • Space Complexity: O((N+n(s))(d+k0))O((N+n^{(s)})(d + k_0)).
  • Convergence: FCPL at each client and MCPL at the server both converge when object-affiliation matrices stabilize.
  • Monotonic Improvement: Increasing hierarchical granularity levels in MCPL consistently improves key clustering metrics.
  • Privacy: Prototype-only, one-shot communication minimizes information leak compared to iterative protocols.

5. Empirical Evaluation and Comparative Performance

Extensive benchmarking on ten public tabular datasets, under simulated fragmentation and random distribution of clusterlets across clients, demonstrates:

  • State-of-the-art clustering scores in Purity, Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), and Clustering Accuracy (ACC).
  • Stable performance and scalability as the number of clients increases from 100 to 1000.
  • Statistically significant improvements (Wilcoxon signed-rank test, 95% confidence) over prior approaches including kFed, FFCM, OSFSC, FedSC, NN-FC, and AFCL.
  • Architectural ablation shows the combination of FCPL (fine partition mechanism) and MCPL (multi-granular learning) is necessary for top performance; disabling either component degrades results.

A summary of performance results is provided below.

Method Avg. Purity Rank Avg. ARI Rank Avg. NMI Rank Avg. ACC Rank
Fed-HIRE 1.3 1.3 1.3 1.8

Data from (Cai et al., 10 Jan 2026), Table 5.

6. Practical Applications, Limitations, and Extensions

The one-shot hierarchical FC framework is especially suited for:

  • Cross-platform personalized recommendation, where user interest clusters are fragmented and distributed.
  • Cross-device user profiling with privacy constraints and incomplete data sharing.
  • Distributed content categorization in decentralized settings (e.g., news, IoT sensors).
  • Federated market segmentation across business silos.

Current limitations include:

  • Design tailored primarily for tabular data. Extensions to vision or multi-modal domains require new similarity measures and possibly communication protocols.
  • Sensitivity to the choice of initial clusterlet count k0k_0 and learning rate η\eta at extremes, though the method is robust within practical ranges.

7. Significance and Future Directions

The one-shot hierarchical FC paradigm presents a compelling solution to the federated unsupervised learning problem, offering bandwidth efficiency, scalability, and privacy by design, while capturing multi-scale cluster structures. By disentangling local fragment discovery (client-side) from global multi-granular alignment (server-side), and leveraging prototype-based, non-iterative communication, these frameworks reconcile heterogeneity and lack of supervision in modern decentralized analytics (Cai et al., 10 Jan 2026). Future progress will likely extend the technique to high-dimensional, non-tabular, and streaming data, and further enhance privacy guarantees.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to One-shot Hierarchical FC Framework.