Machine Unlearning of Federated Clusters (2210.16424v2)
Abstract: Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.
- Communication complexity in locally private distribution estimation and heavy hitters. In International Conference on Machine Learning, pp. 51–60. PMLR, 2019.
- Streaming k-means approximation. Advances in neural information processing systems, 22, 2009.
- Efficient sparse secure aggregation for federated learning. arXiv preprint arXiv:2007.14861, 2020.
- Secure single-server aggregation with (poly) logarithmic overhead. In Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security, pp. 1253–1269, 2020.
- Elwyn R. Berlekamp. Algebraic coding theory. In McGraw-Hill series in systems science, 1968.
- Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Computers and electronics in agriculture, 24(3):131–151, 1999.
- Federated learning and privacy: Building privacy-preserving systems for machine learning and data science on decentralized data. Queue, 19(5):87–114, 2021.
- Practical secure aggregation for privacy-preserving machine learning. In proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191, 2017.
- Machine unlearning. In 2021 IEEE Symposium on Security and Privacy (SP), pp. 141–159. IEEE, 2021.
- LEAF: A Benchmark for Federated Settings. arXiv preprint arXiv:1812.01097, 2018.
- Towards making systems forget with machine unlearning. In 2015 IEEE Symposium on Security and Privacy, pp. 463–480. IEEE, 2015.
- k-means–: A unified approach to clustering and outlier detection. In Proceedings of the 2013 SIAM international conference on data mining, pp. 189–197. SIAM, 2013.
- Graph unlearning. arXiv preprint arXiv:2103.14991, 2021.
- Breaking the communication-privacy-accuracy trilemma. Advances in Neural Information Processing Systems, 33:3312–3324, 2020.
- The fundamental price of secure aggregation in differentially private federated learning. In International Conference on Machine Learning, pp. 3056–3089. PMLR, 2022.
- Certified graph unlearning. In NeurIPS 2022 Workshop: New Frontiers in Graph Learning, 2022. URL https://openreview.net/forum?id=wCxlGc9ZCwi.
- Efficient model updates for approximate unlearning of graph-structured data. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=fhcu4FBLciL.
- Federated unsupervised clustering with generative models. In AAAI 2022 International Workshop on Trustable, Verifiable and Auditable Federated Learning, 2022.
- EMNIST: Extending MNIST to handwritten letters. In 2017 international joint conference on neural networks (IJCNN), pp. 2921–2926. IEEE, 2017.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp. 248–255. Ieee, 2009.
- Heterogeneity for the win: One-shot federated clustering. In International Conference on Machine Learning, pp. 2611–2620. PMLR, 2021.
- On the inversion of the vandermonde matrix. Applied mathematics and computation, 174(2):1384–1397, 2006.
- Sparsified secure aggregation for privacy-preserving federated learning. arXiv preprint arXiv:2112.12872, 2021.
- Model inversion attacks that exploit confidence information and basic countermeasures. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1322–1333, 2015.
- Keith Frikken. Privacy-preserving set union. In International Conference on Applied Cryptography and Network Security, pp. 237–252. Springer, 2007.
- Knowledge removal in sampling-based bayesian inference. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=dTqOcTUOQO.
- Guojun Gan and Michael Kwok-Po Ng. K-means clustering with outlier removal. Pattern Recognition Letters, 90:8–14, 2017.
- vqsgd: Vector quantized stochastic gradient descent. In International Conference on Artificial Intelligence and Statistics, pp. 2197–2205. PMLR, 2021.
- 3D hand posture recognition from small unlabeled point sets. In 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 164–169. IEEE, 2014a.
- Measuring distance between unordered sets of different sizes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 137–143, 2014b.
- Inverting gradients-how easy is it to break privacy in federated learning? Advances in Neural Information Processing Systems, 33:16937–16947, 2020.
- An efficient framework for clustered federated learning. Advances in Neural Information Processing Systems, 33:19586–19597, 2020.
- Making AI forget you: Data deletion in machine learning. Advances in neural information processing systems, 32, 2019.
- Eternal sunshine of the spotless net: Selective forgetting in deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9304–9312, 2020a.
- Forgetting outside the box: Scrubbing deep networks of information accessible from input-output observations. In European Conference on Computer Vision, pp. 383–398. Springer, 2020b.
- Clustering data streams: Theory and practice. IEEE transactions on knowledge and data engineering, 15(3):515–528, 2003.
- Certified data removal from machine learning models. In Hal Daumé III and Aarti Singh (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119 of Proceedings of Machine Learning Research, pp. 3832–3842. PMLR, 13–18 Jul 2020.
- Adaptive gradient sparsification for efficient federated learning: An online learning approach. In 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), pp. 300–310. IEEE, 2020.
- Mapping the mouse cell atlas by microwell-seq. Cell, 172(5):1091–1107, 2018.
- Algorithm as 136: A k-means clustering algorithm. Journal of the royal statistical society. series c (applied statistics), 28(1):100–108, 1979.
- Improving k-means by outlier removal. In Scandinavian conference on image analysis, pp. 978–987. Springer, 2005.
- The Cancer Genome Atlas: Creating Lasting Value beyond Its Data. Cell, 173(2):283–285, 2018.
- Advances and open problems in federated learning. Foundations and Trends® in Machine Learning, 14(1–2):1–210, 2021.
- Fast polynomial factorization and modular composition. SIAM Journal on Computing, 40(6):1767–1802, 2011.
- Privacy-preserving set operations. In Annual International Cryptology Conference, pp. 241–257. Springer, 2005.
- Improved and simplified inapproximability for k-means. Information Processing Letters, 120:40–43, 2017.
- Secure federated clustering. arXiv preprint arXiv:2205.15564, 2022.
- Federaser: Enabling efficient client-level data removal from federated learning models. In 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), pp. 1–10. IEEE, 2021.
- Stuart Lloyd. Least squares quantization in pcm. IEEE transactions on information theory, 28(2):129–137, 1982.
- The planar k-means problem is np-hard. Theoretical Computer Science, 442:13–21, 2012.
- Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619, 2020.
- James Massey. Shift-register synthesis and bch decoding. IEEE transactions on Information Theory, 15(1):122–127, 1969.
- Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. PMLR, 2017.
- Unlearning graph classifiers with limited data resources. In The Web Conference, 2023.
- The NIH Human Microbiome Project. Genome research, 19(12):2317–2323, 2009.
- Polynomial codes over certain finite fields. Journal of the society for industrial and applied mathematics, 8(2):300–304, 1960.
- Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE transactions on neural networks and learning systems, 32(8):3710–3722, 2020.
- Remember what you want to forget: Algorithms for machine unlearning. Advances in Neural Information Processing Systems, 34:18075–18086, 2021.
- Constant-round multi-party private set union using reversed laurent series. In International Workshop on Public Key Cryptography, pp. 398–412. Springer, 2012.
- Lightsecagg: a lightweight and versatile design for secure aggregation in federated learning. Proceedings of Machine Learning and Systems, 4:694–720, 2022.
- Uk biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS medicine, 12(3):e1001779, 2015.
- Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64–73, 2016.
- k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1027–1035, 2006.
- Algorithms that remember: model inversion attacks and data protection law. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 376(2133):20180083, 2018.
- A field guide to federated optimization. arXiv preprint arXiv:2107.06917, 2021.
- Federated unlearning via class-discriminative pruning. In Proceedings of the ACM Web Conference 2022, pp. 622–632, 2022.
- Federated unlearning with knowledge distillation. arXiv preprint arXiv:2201.09441, 2022.
- Deep leakage from gradients. Advances in neural information processing systems, 32, 2019.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.