Towards Sybil Resilience in Decentralized Learning (2306.15044v1)
Abstract: Federated learning is a privacy-enforcing machine learning technology but suffers from limited scalability. This limitation mostly originates from the internet connection and memory capacity of the central parameter server, and the complexity of the model aggregation function. Decentralized learning has recently been emerging as a promising alternative to federated learning. This novel technology eliminates the need for a central parameter server by decentralizing the model aggregation across all participating nodes. Numerous studies have been conducted on improving the resilience of federated learning against poisoning and Sybil attacks, whereas the resilience of decentralized learning remains largely unstudied. This research gap serves as the main motivator for this study, in which our objective is to improve the Sybil poisoning resilience of decentralized learning. We present SybilWall, an innovative algorithm focused on increasing the resilience of decentralized learning against targeted Sybil poisoning attacks. By combining a Sybil-resistant aggregation function based on similarity between Sybils with a novel probabilistic gossiping mechanism, we establish a new benchmark for scalable, Sybil-resilient decentralized learning. A comprehensive empirical evaluation demonstrated that SybilWall outperforms existing state-of-the-art solutions designed for federated learning scenarios and is the only algorithm to obtain consistent accuracy over a range of adversarial attack scenarios. We also found SybilWall to diminish the utility of creating many Sybils, as our evaluations demonstrate a higher success rate among adversaries employing fewer Sybils. Finally, we suggest a number of possible improvements to SybilWall and highlight promising future research directions.
- E. V. Polyakov, M. S. Mazhanov, A. Y. Rolich, L. S. Voskov, M. V. Kachalova, and S. V. Polyakov, “Investigation and development of the intelligent voice assistant for the internet of things using machine learning,” in 2018 Moscow Workshop on Electronic and Networking Technologies (MWENT), 2018, pp. 1–5.
- S. A. Salloum, M. Alshurideh, A. Elnagar, and K. Shaalan, “Machine learning and deep learning techniques for cybersecurity: A review,” in Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020), A.-E. Hassanien, A. T. Azar, T. Gaber, D. Oliva, and F. M. Tolba, Eds. Cham: Springer International Publishing, 2020, pp. 50–57.
- B. T.K., C. S. R. Annavarapu, and A. Bablani, “Machine learning algorithms for social media analysis: A survey,” Computer Science Review, vol. 40, p. 100395, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1574013721000356
- X. Wang and Y. Wang, “Improving content-based and hybrid music recommendation using deep learning,” in Proceedings of the 22nd ACM International Conference on Multimedia, ser. MM ’14. New York, NY, USA: Association for Computing Machinery, 2014, p. 627–636. [Online]. Available: https://doi.org/10.1145/2647868.2654940
- J. Prusa, T. M. Khoshgoftaar, and N. Seliya, “The effect of dataset size on training tweet sentiment classifiers,” in 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA), 2015, pp. 96–102.
- J. Hestness, S. Narang, N. Ardalani, G. F. Diamos, H. Jun, H. Kianinejad, M. M. A. Patwary, Y. Yang, and Y. Zhou, “Deep learning scaling is predictable, empirically,” CoRR, vol. abs/1712.00409, 2017. [Online]. Available: http://arxiv.org/abs/1712.00409
- R. Shao, H. He, H. Liu, and D. Liu, “Stochastic channel-based federated learning for medical data privacy preserving,” 2019.
- A. Goldsteen, G. Ezov, R. Shmelkin, M. Moffie, and A. Farkash, “Data minimization for gdpr compliance in machine learning models,” AI and Ethics, pp. 1–15, 2021.
- B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y. Arcas, “Communication-Efficient Learning of Deep Networks from Decentralized Data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, A. Singh and J. Zhu, Eds., vol. 54. PMLR, 20–22 Apr 2017, pp. 1273–1282. [Online]. Available: https://proceedings.mlr.press/v54/mcmahan17a.html
- J. Janai, F. Güney, A. Behl, A. Geiger et al., “Computer vision for autonomous vehicles: Problems, datasets and state of the art,” Foundations and Trends® in Computer Graphics and Vision, vol. 12, no. 1–3, pp. 1–308, 2020.
- P. Navarro, C. Fernández, R. Borraz, and D. Alonso, “A machine learning approach to pedestrian detection for autonomous vehicles using high-definition 3d range data,” Sensors, vol. 17, no. 12, p. 18, Dec 2016. [Online]. Available: http://dx.doi.org/10.3390/s17010018
- A. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage, “Federated learning for mobile keyboard prediction,” CoRR, vol. abs/1811.03604, 2018. [Online]. Available: http://arxiv.org/abs/1811.03604
- T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, and F. Beaufays, “Applied federated learning: Improving google keyboard query suggestions,” CoRR, vol. abs/1812.02903, 2018. [Online]. Available: http://arxiv.org/abs/1812.02903
- M. Chen, R. Mathews, T. Ouyang, and F. Beaufays, “Federated learning of out-of-vocabulary words,” CoRR, vol. abs/1903.10635, 2019. [Online]. Available: http://arxiv.org/abs/1903.10635
- Y. Cheng, Y. Liu, T. Chen, and Q. Yang, “Federated learning for privacy-preserving ai,” Communications of the ACM, vol. 63, no. 12, pp. 33–36, 2020.
- L. Lyu and C. Chen, “A novel attribute reconstruction attack in federated learning,” CoRR, vol. abs/2108.06910, 2021. [Online]. Available: https://arxiv.org/abs/2108.06910
- H. Yang, M. Ge, K. Xiang, and J. Li, “Using highly compressed gradients in federated learning for data reconstruction attacks,” IEEE Transactions on Information Forensics and Security, vol. 18, pp. 818–830, 2023.
- H. S. Sikandar, H. Waheed, S. Tahir, S. U. R. Malik, and W. Rafique, “A detailed survey on federated learning attacks and defenses,” Electronics, vol. 12, no. 2, 2023. [Online]. Available: https://www.mdpi.com/2079-9292/12/2/260
- P. Qiu, X. Zhang, S. Ji, Y. Pu, and T. Wang, “All you need is hashing: Defending against data reconstruction attack in vertical federated learning,” 2022. [Online]. Available: https://arxiv.org/abs/2212.00325
- J. Hamer, M. Mohri, and A. T. Suresh, “FedBoost: A communication-efficient algorithm for federated learning,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 3973–3983. [Online]. Available: https://proceedings.mlr.press/v119/hamer20a.html
- S. Kadhe, N. Rajaraman, O. O. Koyluoglu, and K. Ramchandran, “Fastsecagg: Scalable secure aggregation for privacy-preserving federated learning,” CoRR, vol. abs/2009.11248, 2020. [Online]. Available: https://arxiv.org/abs/2009.11248
- Y. Qi, M. S. Hossain, J. Nie, and X. Li, “Privacy-preserving blockchain-based federated learning for traffic flow prediction,” Future Generation Computer Systems, vol. 117, pp. 328–337, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X2033065X
- J. Hou, F. Wang, C. Wei, H. Huang, Y. Hu, and N. Gui, “Credibility assessment based byzantine-resilient decentralized learning,” IEEE Transactions on Dependable and Secure Computing, pp. 1–12, 2022.
- C. Hu, J. Jiang, and Z. Wang, “Decentralized federated learning: A segmented gossip approach,” CoRR, vol. abs/1908.07782, 2019. [Online]. Available: http://arxiv.org/abs/1908.07782
- I. Hegedűs, G. Danner, and M. Jelasity, “Decentralized learning works: An empirical comparison of gossip learning and federated learning,” Journal of Parallel and Distributed Computing, vol. 148, pp. 109–124, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0743731520303890
- Z. Tang, S. Shi, B. Li, and X. Chu, “Gossipfl: A decentralized federated learning framework with sparsified and adaptive communication,” IEEE Transactions on Parallel and Distributed Systems, vol. 34, no. 3, pp. 909–922, 2023.
- M. de Vos, A. Dhasade, A.-M. Kermarrec, E. Lavoie, and J. Pouwelse, “Modest: Bridging the gap between federated and decentralized learning with decentralized sampling,” 2023.
- V. Tolpegin, S. Truex, M. E. Gursoy, and L. Liu, “Data poisoning attacks against federated learning systems,” in Computer Security – ESORICS 2020, L. Chen, N. Li, K. Liang, and S. Schneider, Eds. Cham: Springer International Publishing, 2020, pp. 480–501.
- J. R. Douceur, “The sybil attack,” in Peer-to-Peer Systems, P. Druschel, F. Kaashoek, and A. Rowstron, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2002, pp. 251–260.
- C. Fung, C. J. M. Yoon, and I. Beschastnikh, “Mitigating sybils in federated learning poisoning,” CoRR, vol. abs/1808.04866, 2018. [Online]. Available: http://arxiv.org/abs/1808.04866
- I. Hegedűs, G. Danner, and M. Jelasity, “Gossip learning as a decentralized alternative to federated learning,” in Distributed Applications and Interoperable Systems, J. Pereira and L. Ricci, Eds. Cham: Springer International Publishing, 2019, pp. 74–90.
- A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger, “Braintorrent: A peer-to-peer environment for decentralized federated learning,” CoRR, vol. abs/1905.06731, 2019. [Online]. Available: http://arxiv.org/abs/1905.06731
- N. M. Jebreel, J. Domingo-Ferrer, D. Sánchez, and A. Blanco-Justicia, “Defending against the label-flipping attack in federated learning,” 2022. [Online]. Available: https://arxiv.org/abs/2207.01982
- D. Li, W. E. Wong, W. Wang, Y. Yao, and M. Chau, “Detection and mitigation of label-flipping attacks in federated learning systems with kpca and k-means,” in 2021 8th International Conference on Dependable Systems and Their Applications (DSA), 2021, pp. 551–559.
- E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to backdoor federated learning,” in Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, ser. Proceedings of Machine Learning Research, S. Chiappa and R. Calandra, Eds., vol. 108. PMLR, 26–28 Aug 2020, pp. 2938–2948. [Online]. Available: https://proceedings.mlr.press/v108/bagdasaryan20a.html
- Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really backdoor federated learning?” CoRR, vol. abs/1911.07963, 2019. [Online]. Available: http://arxiv.org/abs/1911.07963
- C. Wu, X. Yang, S. Zhu, and P. Mitra, “Mitigating backdoor attacks in federated learning,” CoRR, vol. abs/2011.01767, 2020. [Online]. Available: https://arxiv.org/abs/2011.01767
- ——, “Mitigating backdoor attacks in federated learning,” CoRR, vol. abs/2011.01767, 2020. [Online]. Available: https://arxiv.org/abs/2011.01767
- B. N. Levine, C. Shields, and N. B. Margolin, “A survey of solutions to the sybil attack,” University of Massachusetts Amherst, Amherst, MA, vol. 7, p. 224, 2006.
- D. N. Tran, B. Min, J. Li, and L. Subramanian, “Sybil-resilient online content voting.” in NSDI, vol. 9, no. 1, 2009, pp. 15–28.
- H. Rowaihy, W. Enck, P. McDaniel, and T. La Porta, “Limiting sybil attacks in structured p2p networks,” in IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications, 2007, pp. 2596–2600.
- Y. Xie, F. Yu, Q. Ke, M. Abadi, E. Gillum, K. Vitaldevaria, J. Walter, J. Huang, and Z. M. Mao, “Innocent by association: Early recognition of legitimate users,” in Proceedings of the 2012 ACM Conference on Computer and Communications Security, ser. CCS ’12. New York, NY, USA: Association for Computing Machinery, 2012, p. 353–364. [Online]. Available: https://doi.org/10.1145/2382196.2382235
- F. Lesueur, L. Mé, and V. V. T. Tong, “A sybil-resistant admission control coupling sybilguard with distributed certification,” in 2008 IEEE 17th Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises, 2008, pp. 105–110.
- M. Moradi and M. Keyvanpour, “Captcha and its alternatives: A review,” Security and Communication Networks, vol. 8, no. 12, pp. 2135–2156, 2015. [Online]. Available: https://onlinelibrary.wiley.com/doi/abs/10.1002/sec.1157
- Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
- A. Krizhevsky, V. Nair, and G. Hinton, “Cifar-10 (canadian institute for advanced research).” [Online]. Available: http://www.cs.toronto.edu/ kriz/cifar.html
- P. Blanchard, E. M. El Mhamdi, R. Guerraoui, and J. Stainer, “Machine learning with adversaries: Byzantine tolerant gradient descent,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., vol. 30. Curran Associates, Inc., 2017.
- M. de Vos and J. Pouwelse, “Contrib: Maintaining fairness in decentralized big tech alternatives by accounting work,” Computer Networks, vol. 192, p. 108081, 2021. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389128621001705
- Q. Stokkink, C. U. Ileri, D. Epema, and J. Pouwelse, “Web3 sybil avoidance using network latency,” Computer Networks, vol. 227, p. 109701, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S1389128623001469
- M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou, “Walking in facebook: A case study of unbiased sampling of osns,” in 2010 Proceedings IEEE INFOCOM, 2010, pp. 1–9.
- A. Singh, T.-W. Ngan, P. Druschel, and D. S. Wallach, “Eclipse attacks on overlay networks: Threats and defenses,” in Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications, 2006, pp. 1–12.
- R. Church and C. ReVelle, “The maximal covering location problem,” in Papers of the regional science association, vol. 32, no. 1. Springer-Verlag Berlin/Heidelberg, 1974, pp. 101–118.
- N. Megiddo, E. Zemel, and S. L. Hakimi, “The maximum coverage location problem,” SIAM Journal on Algebraic Discrete Methods, vol. 4, no. 2, pp. 253–261, 1983. [Online]. Available: https://doi.org/10.1137/0604028
- T. Werthenbach and J. Pouwelse, “Towards sybil resilience in decentralized learning,” https://doi.org/10.5281/zenodo.8077387, Jun 2023.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
- Tribler, “Python implementation of tribler’s ipv8 p2p-networking layer,” https://github.com/Tribler/py-ipv8, 2023.
- ——, “Experiment runner framework for ipv8 and tribler,” https://github.com/Tribler/gumby, 2022.
- H. Bal, D. Epema, C. de Laat, R. van Nieuwpoort, J. Romein, F. Seinstra, C. Snoek, and H. Wijshoff, “A medium-scale distributed system for computer science research: Infrastructure for the long term,” Computer, vol. 49, no. 05, pp. 54–63, may 2016.
- L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
- C. Thapa, M. A. P. Chamikara, and S. Camtepe, “Splitfed: When federated learning meets split learning,” CoRR, vol. abs/2004.12088, 2020. [Online]. Available: https://arxiv.org/abs/2004.12088
- Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” in NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011. [Online]. Available: http://ufldl.stanford.edu/housenumbers/nips2011_housenumbers.pdf
- C. Pappas, D. Chatzopoulos, S. Lalis, and M. Vavalis, “Ipls: A framework for decentralized federated learning,” in 2021 IFIP Networking Conference (IFIP Networking), 2021, pp. 1–6.
- S. Alqahtani and M. Demirbas, “Performance analysis and comparison of distributed machine learning systems,” CoRR, vol. abs/1909.02061, 2019. [Online]. Available: http://arxiv.org/abs/1909.02061
- J. Verbraeken, M. de Vos, and J. Pouwelse, “Bristle: Decentralized federated learning in byzantine, non-i.i.d. environments,” CoRR, vol. abs/2110.11006, 2021. [Online]. Available: https://arxiv.org/abs/2110.11006
- H. Ye, L. Liang, and G. Y. Li, “Decentralized federated learning with unreliable communications,” IEEE Journal of Selected Topics in Signal Processing, vol. 16, no. 3, pp. 487–500, 2022.
- K. Weiss, T. M. Khoshgoftaar, and D. Wang, “A survey of transfer learning,” Journal of Big data, vol. 3, no. 1, pp. 1–40, 2016.
- T.-C. Chiu, Y.-Y. Shih, A.-C. Pang, C.-S. Wang, W. Weng, and C.-T. Chou, “Semisupervised distributed learning with non-iid data for aiot service platform,” IEEE Internet of Things Journal, vol. 7, no. 10, pp. 9266–9277, 2020.
- K. Hsieh, A. Phanishayee, O. Mutlu, and P. Gibbons, “The non-IID data quagmire of decentralized machine learning,” in Proceedings of the 37th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, H. D. III and A. Singh, Eds., vol. 119. PMLR, 13–18 Jul 2020, pp. 4387–4398. [Online]. Available: https://proceedings.mlr.press/v119/hsieh20a.html
- Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” CoRR, vol. abs/1806.00582, 2018. [Online]. Available: http://arxiv.org/abs/1806.00582
- Y. Chen, Y. Ning, M. Slawski, and H. Rangwala, “Asynchronous online federated learning for edge devices with non-iid data,” in 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 15–24.
- C. Briggs, Z. Fan, and P. Andras, “Federated learning with hierarchical clustering of local updates to improve training on non-iid data,” in 2020 International Joint Conference on Neural Networks (IJCNN), 2020, pp. 1–9.
- G. L. Dirichlet, “Über die reduction der positiven quadratischen formen mit drei unbestimmten ganzen zahlen.” Journal für die reine und angewandte Mathematik (Crelles Journal), vol. 1850, no. 40, pp. 209–227, 1850. [Online]. Available: https://doi.org/10.1515/crll.1850.40.209
- L. Gao, H. Fu, L. Li, Y. Chen, M. Xu, and C.-Z. Xu, “Feddc: Federated learning with non-iid data via local drift decoupling and correction,” 2022.
- X. Mu, Y. Shen, K. Cheng, X. Geng, J. Fu, T. Zhang, and Z. Zhang, “Fedproc: Prototypical contrastive federated learning on non-iid data,” Future Generation Computer Systems, vol. 143, pp. 93–104, 2023. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0167739X23000262
- D. Yin, Y. Chen, K. Ramchandran, and P. L. Bartlett, “Byzantine-robust distributed learning: Towards optimal statistical rates,” CoRR, vol. abs/1803.01498, 2018. [Online]. Available: http://arxiv.org/abs/1803.01498
- E. Drott, “Fake streams, listening bots, and click farms: Counterfeiting attention in the streaming music economy,” American Music, vol. 38, no. 2, pp. 153–175, 2020.
- Y. Mao, D. Data, S. Diggavi, and P. Tabuada, “Decentralized learning robust to data poisoning attacks,” in 2022 IEEE 61st Conference on Decision and Control (CDC), 2022, pp. 6788–6793.
- S. Kaur and S. Jindal, “A survey on machine learning algorithms,” Int J Innovative Res Adv Eng (IJIRAE), vol. 3, no. 11, pp. 2349–2763, 2016.
- S. Bach, A. Binder, G. Montavon, F. Klauschen, K.-R. Müller, and W. Samek, “On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation,” PLOS ONE, vol. 10, no. 7, pp. 1–46, 07 2015. [Online]. Available: https://doi.org/10.1371/journal.pone.0130140
- M. C. Mozer and P. Smolensky, “Skeletonization: A technique for trimming the fat from a network via relevance assessment,” in Advances in Neural Information Processing Systems, D. Touretzky, Ed., vol. 1. Morgan-Kaufmann, 1988.
- J.-H. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural network compression,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), Oct 2017.