Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Unified Framework for Center-based Clustering of Distributed Data (2402.01302v2)

Published 2 Feb 2024 in cs.LG, cs.DC, and cs.MA

Abstract: We develop a family of distributed center-based clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-$\mathcal{F}\rho$), is parametrized by $\rho \geq 1$, controling the proximity of users' center estimates, with $\mathcal{F}$ determining the clustering loss. Our framework allows for a broad class of smooth convex loss functions, including popular clustering losses like $K$-means and Huber loss. Specialized to popular clustering losses like $K$-means and Huber loss, DGC-$\mathcal{F}\rho$ gives rise to novel distributed clustering algorithms DGC-KM$\rho$ and DGC-HL$\rho$, while novel clustering losses based on Logistic and Fair functions lead to DGC-LL$\rho$ and DGC-FL$\rho$. We provide a unified analysis and establish several strong results, under mild assumptions. First, we show that the sequence of centers generated by the methods converges to a well-defined notion of fixed point, under any center initialization and value of $\rho$. Second, we prove that, as $\rho$ increases, the family of fixed points produced by DGC-$\mathcal{F}\rho$ converges to a notion of consensus fixed points. We show that consensus fixed points of DGC-$\mathcal{F}{\rho}$ are equivalent to fixed points of gradient clustering over the full data, guaranteeing a clustering of the full data is produced. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points. Extensive numerical experiments on synthetic and real data confirm our theoretical findings, show strong performance of our methods and demonstrate the usefulness and wide range of potential applications of our general framework, such as outlier detection.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com