Consensus learning: A novel decentralised ensemble learning paradigm (2402.16157v1)
Abstract: The widespread adoption of large-scale machine learning models in recent years highlights the need for distributed computing for efficiency and scalability. This work introduces a novel distributed machine learning paradigm -- \emph{consensus learning} -- which combines classical ensemble methods with consensus protocols deployed in peer-to-peer systems. These algorithms consist of two phases: first, participants develop their models and submit predictions for any new data inputs; second, the individual predictions are used as inputs for a communication phase, which is governed by a consensus protocol. Consensus learning ensures user data privacy, while also inheriting the safety measures against Byzantine attacks from the underlying consensus mechanism. We provide a detailed theoretical analysis for a particular consensus protocol and compare the performance of the consensus learning ensemble with centralised ensemble learning algorithms. The discussion is supplemented by various numerical simulations, which describe the robustness of the algorithms against Byzantine participants.
- Jakub Konečnỳ, Brendan McMahan and Daniel Ramage “Federated Optimization: Distributed Optimization Beyond the Datacenter”, 2015 arXiv:1511.03575 [cs.LG]
- “Communication-efficient learning of deep networks from decentralized data” In Artificial intelligence and statistics, 2017, pp. 1273–1282 PMLR
- “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”, 2016 arXiv:1603.04467 [cs.DC]
- “Demystifying parallel and distributed deep learning: An in-depth concurrency analysis” In ACM Computing Surveys (CSUR) 52.4 ACM New York, NY, USA, 2019, pp. 1–43
- “A survey on distributed Machine Learning” In ACM Comput. Surv. 53.2 New York, NY, USA: Association for Computing Machinery, 2020 DOI: 10.1145/3377454
- “On the Opportunities and Risks of Foundation Models”, 2022 arXiv:2108.07258 [cs.LG]
- “Federated Learning: Strategies for Improving Communication Efficiency”, 2017 arXiv:1610.05492 [cs.LG]
- Leslie Lamport, Robert Shostak and Marshall Pease “The Byzantine Generals Problem” In ACM Trans. Program. Lang. Syst. 4.3 New York, NY, USA: Association for Computing Machinery, 1982, pp. 382–401 DOI: 10.1145/357172.357176
- Rachid Guerraoui, Nirupam Gupta and Rafael Pinot “Byzantine Machine Learning: A primer” In ACM Comput. Surv. New York, NY, USA: Association for Computing Machinery, 2023 DOI: 10.1145/3616537
- “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning” In NDSS, 2021
- “Byzantine Fault Tolerance in Distributed Machine Learning : a Survey”, 2022 arXiv:2205.02572 [cs.DC]
- “Challenges and Approaches for Mitigating Byzantine Attacks in Federated Learning”, 2022 arXiv:2112.14468 [cs.CR]
- Albert Cheu, Adam Smith and Jonathan Ullman “Manipulation Attacks in Local Differential Privacy”, 2019 arXiv:1909.09630 [cs.DS]
- “Inverting gradients - How easy is it to break privacy in Federated Learning?” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 16937–16947
- Ligeng Zhu, Zhijian Liu and Song Han “Deep leakage from gradients” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
- Matt Fredrikson, Somesh Jha and Thomas Ristenpart “Model inversion attacks that exploit confidence information and basic countermeasures” In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 1322–1333
- “Swarm learning for decentralized and confidential clinical machine learning” In Nature 594.7862 Nature Publishing Group UK London, 2021, pp. 265–270
- “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” Now FoundationsTrends, 2011, pp. 128 DOI: 10.1561/2200000016
- “Collaborative Learning in the jungle (decentralized, Byzantine, heterogeneous, asynchronous and nonconvex learning)” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 25044–25057
- “A Survey on Efficient Federated Learning Methods for Foundation Model Training”, 2024 arXiv:2401.04472 [cs.LG]
- “Ensemble learning: A survey” In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.4 Wiley Online Library, 2018, pp. e1249
- Sinno Jialin Pan and Qiang Yang “A survey on Transfer Learning” In IEEE Transactions on Knowledge and Data Engineering 22.10, 2010, pp. 1345–1359 DOI: 10.1109/TKDE.2009.191
- Satinder Pal Singh “Transfer of learning by composing solutions of elemental sequential tasks” In Machine learning 8 Springer, 1992, pp. 323–339
- “Learning to learn: Introduction and overview” In Learning to learn Springer, 1998, pp. 3–17
- J. Baxter “A model of Inductive Bias Learning” In Journal of Artificial Intelligence Research 12 AI Access Foundation, 2000, pp. 149–198 DOI: 10.1613/jair.731
- Satoshi Nakamoto “Bitcoin: A peer-to-peer electronic cash system”, 2009
- “Scalable and Probabilistic Leaderless BFT Consensus through Metastability”, 2020 arXiv:1906.08936 [cs.DC]
- “Meta-Learning in Neural Networks: A survey” In IEEE Transactions on Pattern Analysis and Machine Intelligence 44.09 Los Alamitos, CA, USA: IEEE Computer Society, 2022, pp. 5149–5169 DOI: 10.1109/TPAMI.2021.3079209
- “Meta-Learning Update Rules for Unsupervised Representation Learning”, 2019 arXiv:1804.00222 [cs.LG]
- “Cluster ensembles – a knowledge reuse framework for combining multiple partitions” In Journal of machine learning research 3.Dec, 2002, pp. 583–617
- Neel Guha, Ameet Talwalkar and Virginia Smith “One-Shot Federated Learning”, 2019 arXiv:1902.11175 [cs.LG]
- “DENSE: Data-free one-shot Federated Learning” In Advances in Neural Information Processing Systems 35 Curran Associates, Inc., 2022, pp. 21414–21428
- “Reaching approximate agreement in the presence of faults” In J. ACM 33.3 New York, NY, USA: Association for Computing Machinery, 1986, pp. 499–516 DOI: 10.1145/5925.5931
- Geoffrey Hinton, Oriol Vinyals and Jeff Dean “Distilling the Knowledge in a Neural Network”, 2015 arXiv:1503.02531 [stat.ML]
- “Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer”, 2019 arXiv:1912.11279 [stat.ML]
- “FedMD: Heterogenous Federated Learning via Model Distillation”, 2019 arXiv:1910.03581 [cs.LG]
- Christophe Roux, Max Zimmer and Sebastian Pokutta “On the Byzantine-Resilience of Distillation-Based Federated Learning”, 2024 arXiv:2402.12265 [cs.LG]
- Bo Liu, Zhengtao Ding and Chen Lv “Distributed training for multi-layer neural networks by consensus” In IEEE transactions on neural networks and learning systems 31.5 IEEE, 2019, pp. 1771–1778
- “Distributed heuristic adaptive neural networks with variance reduction in switching graphs” In IEEE Transactions on Cybernetics 51.7 IEEE, 2019, pp. 3836–3844
- “Representation Learning for dynamic graphs: A survey” JMLR.org, 2020
- Felipe Bravo-Marquez, Steve Reeves and Martín Ugarte “Proof-of-Learning: A blockchain consensus mechanism based on Machine Learning competitions” In 2019 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON), 2019, pp. 119–124 DOI: 10.1109/DAPPCON.2019.00023
- “Proof of Learning (PoLe): Empowering neural network training with consensus building on blockchains” In Computer Networks 201, 2021, pp. 108594 DOI: 10.1016/j.comnet.2021.108594
- “Blockchain Empowered Asynchronous Federated Learning for Secure Data Sharing in Internet of Vehicles” In IEEE Transactions on Vehicular Technology 69.4, 2020, pp. 4298–4311 DOI: 10.1109/TVT.2020.2973651
- “A Hierarchical Blockchain-Enabled Federated Learning Algorithm for Knowledge Sharing in Internet of Vehicles” In IEEE Transactions on Intelligent Transportation Systems 22.7, 2021, pp. 3975–3986 DOI: 10.1109/TITS.2020.3002712
- Thomas G. Dietterich “Ensemble Methods in Machine Learning” In Multiple Classifier Systems Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 1–15
- Chelsea Finn, Pieter Abbeel and Sergey Levine “Model-agnostic meta-learning for fast adaptation of deep networks” In International conference on machine learning, 2017, pp. 1126–1135 PMLR
- Alireza Fallah, Aryan Mokhtari and Asuman Ozdaglar “Personalized Federated Learning with theoretical guarantees: A model-agnostic meta-learning approach” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 3557–3568
- “Boosting the margin: a new explanation for the effectiveness of voting methods” In The Annals of Statistics 26.5 Institute of Mathematical Statistics, 1998, pp. 1651–1686 DOI: 10.1214/aos/1024691352
- Thomas G Dietterich “Ensemble learning” In The handbook of brain theory and neural networks 2.1, 2002, pp. 110–125
- R. Polikar “Ensemble based systems in decision making” In IEEE Circuits and Systems Magazine 6.3, 2006, pp. 21–45 DOI: 10.1109/MCAS.2006.1688199
- Hendrik Blockeel “Hypothesis Space” In Encyclopedia of Machine Learning Boston, MA: Springer US, 2010, pp. 511–513 DOI: 10.1007/978-0-387-30164-8˙373
- “A theory of learning from different domains” In Machine learning 79 Springer, 2010, pp. 151–175
- Leo Breiman “Bagging predictors” In Machine learning 24 Springer, 1996, pp. 123–140
- Leo Breiman “Random forests” In Machine learning 45 Springer, 2001, pp. 5–32
- Robert E Schapire and Yoram Singer “Improved boosting algorithms using confidence-rated predictions” In Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 80–91
- David H Wolpert “Stacked generalization” In Neural networks 5.2 Elsevier, 1992, pp. 241–259
- “Neural network ensembles” In IEEE Transactions on Pattern Analysis and Machine Intelligence 12.10, 1990, pp. 993–1001 DOI: 10.1109/34.58871
- E Harrell Frank “Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis” Spinger, 2015
- Krishna K. Ladha “The Condorcet Jury Theorem, free speech, and correlated votes” In American Journal of Political Science 36.3 [Midwest Political Science Association, Wiley], 1992, pp. 617–634
- Wassily Hoeffding “On the distribution of the number of successes in independent trials” In The Annals of Mathematical Statistics 27.3 Institute of Mathematical Statistics, 1956, pp. 713–721 DOI: 10.1214/aoms/1177728178
- Yoav Freund and Robert E Schapire “A desicion-theoretic generalization of on-line learning and an application to boosting” In European conference on computational learning theory, 1995, pp. 23–37 Springer
- “Optimal decision rules in uncertain dichotomous choice situations” In International Economic Review 23.2 [Economics Department of the University of Pennsylvania, Wiley, Institute of SocialEconomic Research, Osaka University], 1982, pp. 289–297
- “Practical Byzantine Fault Tolerance and proactive recovery” In ACM Trans. Comput. Syst. 20.4 New York, NY, USA: Association for Computing Machinery, 2002, pp. 398–461 DOI: 10.1145/571637.571640
- “Basic concepts and taxonomy of dependable and secure computing” In IEEE Transactions on Dependable and Secure Computing 1.1, 2004, pp. 11–33 DOI: 10.1109/TDSC.2004.2
- Christian Cachin, Rachid Guerraoui and Lus Rodrigues “Introduction to Reliable and Secure Distributed Programming” Springer Publishing Company, Incorporated, 2011
- “Blockchain Consensus Protocols in the Wild”, 2017 arXiv:1707.01873 [cs.DC]
- Ignacio Amores-Sesar, Christian Cachin and Philipp Schneider “An Analysis of Avalanche Consensus”, 2024 arXiv:2401.02811 [cs.DC]
- “Epidemic algorithms for replicated database maintenance” In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, 1987, pp. 1–12
- “The Generals’ Scuttlebutt: Byzantine-Resilient gossip protocols” In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22 Los Angeles, CA, USA: Association for Computing Machinery, 2022, pp. 595–608 DOI: 10.1145/3548606.3560638
- “Randomized rumor spreading” In Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 565–574 DOI: 10.1109/SFCS.2000.892324
- Ignacio Amores-Sesar, Christian Cachin and Enrico Tedeschi “When is Spring coming? A Security Analysis of Avalanche Consensus”, 2022 arXiv:2210.03423 [cs.DC]
- “A Simple Framework for Contrastive Learning of Visual Representations” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 1597–1607
- Mark A Kramer “Nonlinear principal component analysis using autoassociative neural networks” In AIChE journal 37.2 Wiley Online Library, 1991, pp. 233–243
- Janarthanan Rajendran, Alexander Irpan and Eric Jang “Meta-Learning Requires Meta-Augmentation” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 5705–5715
- Philip J. Boland “Majority systems and the Condorcet Jury Theorem” In Journal of the Royal Statistical Society Series D: The Statistician 38.3, 2018, pp. 181–189 DOI: 10.2307/2348873
- Mark Fey “A note on the Condorcet Jury Theorem with supermajority voting rules” In Social Choice and Welfare JSTOR, 2003, pp. 27–32
- Sebastian Caldas and Sai Meher Karthik Duddu and Peter Wu and Tian Li and Jakub Konečný and H. Brendan McMahan and Virginia Smith and Ameet Talwalkar “LEAF: A Benchmark for Federated Settings”, 2019 arXiv:1812.01097 [cs.LG]
- Michael J Fischer, Nancy A Lynch and Michael S Paterson “Impossibility of distributed consensus with one faulty process” In Journal of the ACM (JACM) 32.2 ACM New York, NY, USA, 1985, pp. 374–382
- Maksym Zavershynskyi, Medium.com “Exploring Liveness of Avalanche” Accessed: 2024-01-04, https://medium.com/@zaver.max/exploring-liveness-of-avalanche-d22f13b2db00
- Vasek Chvátal “The tail of the hypergeometric distribution” In Discrete Mathematics 25.3 Elsevier, 1979, pp. 285–287
- Wassily Hoeffding “Probability inequalities for sums of bounded random variables” In The collected works of Wassily Hoeffding Springer, 1994, pp. 409–426
- “Are qualified majority rules special?” In Public Choice 42.3 Springer, 1984, pp. 257–272
- “Federated Learning with non-IID data”, 2018 arXiv:1806.00582 [cs.LG]
- Yann LeCun “The MNIST database of handwritten digits”, http://yann.lecun.com/exdb/mnist/, 1998
- Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik “EMNIST: an extension of MNIST to handwritten letters”, 2017 arXiv:1702.05373 [cs.CV]
- “Xgboost: extreme gradient boosting” In R package version 0.4-2 1.4, 2015, pp. 1–4
- “Lightgbm: A highly efficient gradient boosting decision tree” In Advances in neural information processing systems 30, 2017
- M. Pease, R. Shostak and L. Lamport “Reaching Agreement in the Presence of Faults” In J. ACM 27.2 New York, NY, USA: Association for Computing Machinery, 1980, pp. 228–234 DOI: 10.1145/322186.322188
- Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” http://www.deeplearningbook.org MIT Press, 2016
- “Predicting stroke from electronic health records” In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019, pp. 5704–5707 DOI: 10.1109/EMBC.2019.8857234
- D Mazières “Stellar Consensus Protocol. Stellar”, 2021
- WY Tan “On the absorption probabilities and absorption times of finite homogeneous birth-death processes” In Biometrics JSTOR, 1976, pp. 745–752
- “The classification of birth and death processes” In Transactions of the American Mathematical Society 86.2, 1957, pp. 366–400