Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Consensus learning: A novel decentralised ensemble learning paradigm (2402.16157v1)

Published 25 Feb 2024 in cs.LG and cs.DC

Abstract: The widespread adoption of large-scale machine learning models in recent years highlights the need for distributed computing for efficiency and scalability. This work introduces a novel distributed machine learning paradigm -- \emph{consensus learning} -- which combines classical ensemble methods with consensus protocols deployed in peer-to-peer systems. These algorithms consist of two phases: first, participants develop their models and submit predictions for any new data inputs; second, the individual predictions are used as inputs for a communication phase, which is governed by a consensus protocol. Consensus learning ensures user data privacy, while also inheriting the safety measures against Byzantine attacks from the underlying consensus mechanism. We provide a detailed theoretical analysis for a particular consensus protocol and compare the performance of the consensus learning ensemble with centralised ensemble learning algorithms. The discussion is supplemented by various numerical simulations, which describe the robustness of the algorithms against Byzantine participants.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. Jakub Konečnỳ, Brendan McMahan and Daniel Ramage “Federated Optimization: Distributed Optimization Beyond the Datacenter”, 2015 arXiv:1511.03575 [cs.LG]
  2. “Communication-efficient learning of deep networks from decentralized data” In Artificial intelligence and statistics, 2017, pp. 1273–1282 PMLR
  3. “TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems”, 2016 arXiv:1603.04467 [cs.DC]
  4. “Demystifying parallel and distributed deep learning: An in-depth concurrency analysis” In ACM Computing Surveys (CSUR) 52.4 ACM New York, NY, USA, 2019, pp. 1–43
  5. “A survey on distributed Machine Learning” In ACM Comput. Surv. 53.2 New York, NY, USA: Association for Computing Machinery, 2020 DOI: 10.1145/3377454
  6. “On the Opportunities and Risks of Foundation Models”, 2022 arXiv:2108.07258 [cs.LG]
  7. “Federated Learning: Strategies for Improving Communication Efficiency”, 2017 arXiv:1610.05492 [cs.LG]
  8. Leslie Lamport, Robert Shostak and Marshall Pease “The Byzantine Generals Problem” In ACM Trans. Program. Lang. Syst. 4.3 New York, NY, USA: Association for Computing Machinery, 1982, pp. 382–401 DOI: 10.1145/357172.357176
  9. Rachid Guerraoui, Nirupam Gupta and Rafael Pinot “Byzantine Machine Learning: A primer” In ACM Comput. Surv. New York, NY, USA: Association for Computing Machinery, 2023 DOI: 10.1145/3616537
  10. “Manipulating the byzantine: Optimizing model poisoning attacks and defenses for federated learning” In NDSS, 2021
  11. “Byzantine Fault Tolerance in Distributed Machine Learning : a Survey”, 2022 arXiv:2205.02572 [cs.DC]
  12. “Challenges and Approaches for Mitigating Byzantine Attacks in Federated Learning”, 2022 arXiv:2112.14468 [cs.CR]
  13. Albert Cheu, Adam Smith and Jonathan Ullman “Manipulation Attacks in Local Differential Privacy”, 2019 arXiv:1909.09630 [cs.DS]
  14. “Inverting gradients - How easy is it to break privacy in Federated Learning?” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 16937–16947
  15. Ligeng Zhu, Zhijian Liu and Song Han “Deep leakage from gradients” In Advances in Neural Information Processing Systems 32 Curran Associates, Inc., 2019
  16. Matt Fredrikson, Somesh Jha and Thomas Ristenpart “Model inversion attacks that exploit confidence information and basic countermeasures” In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, 2015, pp. 1322–1333
  17. “Swarm learning for decentralized and confidential clinical machine learning” In Nature 594.7862 Nature Publishing Group UK London, 2021, pp. 265–270
  18. “Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers” Now FoundationsTrends, 2011, pp. 128 DOI: 10.1561/2200000016
  19. “Collaborative Learning in the jungle (decentralized, Byzantine, heterogeneous, asynchronous and nonconvex learning)” In Advances in Neural Information Processing Systems 34 Curran Associates, Inc., 2021, pp. 25044–25057
  20. “A Survey on Efficient Federated Learning Methods for Foundation Model Training”, 2024 arXiv:2401.04472 [cs.LG]
  21. “Ensemble learning: A survey” In Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 8.4 Wiley Online Library, 2018, pp. e1249
  22. Sinno Jialin Pan and Qiang Yang “A survey on Transfer Learning” In IEEE Transactions on Knowledge and Data Engineering 22.10, 2010, pp. 1345–1359 DOI: 10.1109/TKDE.2009.191
  23. Satinder Pal Singh “Transfer of learning by composing solutions of elemental sequential tasks” In Machine learning 8 Springer, 1992, pp. 323–339
  24. “Learning to learn: Introduction and overview” In Learning to learn Springer, 1998, pp. 3–17
  25. J. Baxter “A model of Inductive Bias Learning” In Journal of Artificial Intelligence Research 12 AI Access Foundation, 2000, pp. 149–198 DOI: 10.1613/jair.731
  26. Satoshi Nakamoto “Bitcoin: A peer-to-peer electronic cash system”, 2009
  27. “Scalable and Probabilistic Leaderless BFT Consensus through Metastability”, 2020 arXiv:1906.08936 [cs.DC]
  28. “Meta-Learning in Neural Networks: A survey” In IEEE Transactions on Pattern Analysis and Machine Intelligence 44.09 Los Alamitos, CA, USA: IEEE Computer Society, 2022, pp. 5149–5169 DOI: 10.1109/TPAMI.2021.3079209
  29. “Meta-Learning Update Rules for Unsupervised Representation Learning”, 2019 arXiv:1804.00222 [cs.LG]
  30. “Cluster ensembles – a knowledge reuse framework for combining multiple partitions” In Journal of machine learning research 3.Dec, 2002, pp. 583–617
  31. Neel Guha, Ameet Talwalkar and Virginia Smith “One-Shot Federated Learning”, 2019 arXiv:1902.11175 [cs.LG]
  32. “DENSE: Data-free one-shot Federated Learning” In Advances in Neural Information Processing Systems 35 Curran Associates, Inc., 2022, pp. 21414–21428
  33. “Reaching approximate agreement in the presence of faults” In J. ACM 33.3 New York, NY, USA: Association for Computing Machinery, 1986, pp. 499–516 DOI: 10.1145/5925.5931
  34. Geoffrey Hinton, Oriol Vinyals and Jeff Dean “Distilling the Knowledge in a Neural Network”, 2015 arXiv:1503.02531 [stat.ML]
  35. “Cronus: Robust and Heterogeneous Collaborative Learning with Black-Box Knowledge Transfer”, 2019 arXiv:1912.11279 [stat.ML]
  36. “FedMD: Heterogenous Federated Learning via Model Distillation”, 2019 arXiv:1910.03581 [cs.LG]
  37. Christophe Roux, Max Zimmer and Sebastian Pokutta “On the Byzantine-Resilience of Distillation-Based Federated Learning”, 2024 arXiv:2402.12265 [cs.LG]
  38. Bo Liu, Zhengtao Ding and Chen Lv “Distributed training for multi-layer neural networks by consensus” In IEEE transactions on neural networks and learning systems 31.5 IEEE, 2019, pp. 1771–1778
  39. “Distributed heuristic adaptive neural networks with variance reduction in switching graphs” In IEEE Transactions on Cybernetics 51.7 IEEE, 2019, pp. 3836–3844
  40. “Representation Learning for dynamic graphs: A survey” JMLR.org, 2020
  41. Felipe Bravo-Marquez, Steve Reeves and Martín Ugarte “Proof-of-Learning: A blockchain consensus mechanism based on Machine Learning competitions” In 2019 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON), 2019, pp. 119–124 DOI: 10.1109/DAPPCON.2019.00023
  42. “Proof of Learning (PoLe): Empowering neural network training with consensus building on blockchains” In Computer Networks 201, 2021, pp. 108594 DOI: 10.1016/j.comnet.2021.108594
  43. “Blockchain Empowered Asynchronous Federated Learning for Secure Data Sharing in Internet of Vehicles” In IEEE Transactions on Vehicular Technology 69.4, 2020, pp. 4298–4311 DOI: 10.1109/TVT.2020.2973651
  44. “A Hierarchical Blockchain-Enabled Federated Learning Algorithm for Knowledge Sharing in Internet of Vehicles” In IEEE Transactions on Intelligent Transportation Systems 22.7, 2021, pp. 3975–3986 DOI: 10.1109/TITS.2020.3002712
  45. Thomas G. Dietterich “Ensemble Methods in Machine Learning” In Multiple Classifier Systems Berlin, Heidelberg: Springer Berlin Heidelberg, 2000, pp. 1–15
  46. Chelsea Finn, Pieter Abbeel and Sergey Levine “Model-agnostic meta-learning for fast adaptation of deep networks” In International conference on machine learning, 2017, pp. 1126–1135 PMLR
  47. Alireza Fallah, Aryan Mokhtari and Asuman Ozdaglar “Personalized Federated Learning with theoretical guarantees: A model-agnostic meta-learning approach” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 3557–3568
  48. “Boosting the margin: a new explanation for the effectiveness of voting methods” In The Annals of Statistics 26.5 Institute of Mathematical Statistics, 1998, pp. 1651–1686 DOI: 10.1214/aos/1024691352
  49. Thomas G Dietterich “Ensemble learning” In The handbook of brain theory and neural networks 2.1, 2002, pp. 110–125
  50. R. Polikar “Ensemble based systems in decision making” In IEEE Circuits and Systems Magazine 6.3, 2006, pp. 21–45 DOI: 10.1109/MCAS.2006.1688199
  51. Hendrik Blockeel “Hypothesis Space” In Encyclopedia of Machine Learning Boston, MA: Springer US, 2010, pp. 511–513 DOI: 10.1007/978-0-387-30164-8˙373
  52. “A theory of learning from different domains” In Machine learning 79 Springer, 2010, pp. 151–175
  53. Leo Breiman “Bagging predictors” In Machine learning 24 Springer, 1996, pp. 123–140
  54. Leo Breiman “Random forests” In Machine learning 45 Springer, 2001, pp. 5–32
  55. Robert E Schapire and Yoram Singer “Improved boosting algorithms using confidence-rated predictions” In Proceedings of the eleventh annual conference on Computational learning theory, 1998, pp. 80–91
  56. David H Wolpert “Stacked generalization” In Neural networks 5.2 Elsevier, 1992, pp. 241–259
  57. “Neural network ensembles” In IEEE Transactions on Pattern Analysis and Machine Intelligence 12.10, 1990, pp. 993–1001 DOI: 10.1109/34.58871
  58. E Harrell Frank “Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis” Spinger, 2015
  59. Krishna K. Ladha “The Condorcet Jury Theorem, free speech, and correlated votes” In American Journal of Political Science 36.3 [Midwest Political Science Association, Wiley], 1992, pp. 617–634
  60. Wassily Hoeffding “On the distribution of the number of successes in independent trials” In The Annals of Mathematical Statistics 27.3 Institute of Mathematical Statistics, 1956, pp. 713–721 DOI: 10.1214/aoms/1177728178
  61. Yoav Freund and Robert E Schapire “A desicion-theoretic generalization of on-line learning and an application to boosting” In European conference on computational learning theory, 1995, pp. 23–37 Springer
  62. “Optimal decision rules in uncertain dichotomous choice situations” In International Economic Review 23.2 [Economics Department of the University of Pennsylvania, Wiley, Institute of SocialEconomic Research, Osaka University], 1982, pp. 289–297
  63. “Practical Byzantine Fault Tolerance and proactive recovery” In ACM Trans. Comput. Syst. 20.4 New York, NY, USA: Association for Computing Machinery, 2002, pp. 398–461 DOI: 10.1145/571637.571640
  64. “Basic concepts and taxonomy of dependable and secure computing” In IEEE Transactions on Dependable and Secure Computing 1.1, 2004, pp. 11–33 DOI: 10.1109/TDSC.2004.2
  65. Christian Cachin, Rachid Guerraoui and Lus Rodrigues “Introduction to Reliable and Secure Distributed Programming” Springer Publishing Company, Incorporated, 2011
  66. “Blockchain Consensus Protocols in the Wild”, 2017 arXiv:1707.01873 [cs.DC]
  67. Ignacio Amores-Sesar, Christian Cachin and Philipp Schneider “An Analysis of Avalanche Consensus”, 2024 arXiv:2401.02811 [cs.DC]
  68. “Epidemic algorithms for replicated database maintenance” In Proceedings of the sixth annual ACM Symposium on Principles of distributed computing, 1987, pp. 1–12
  69. “The Generals’ Scuttlebutt: Byzantine-Resilient gossip protocols” In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS ’22 Los Angeles, CA, USA: Association for Computing Machinery, 2022, pp. 595–608 DOI: 10.1145/3548606.3560638
  70. “Randomized rumor spreading” In Proceedings 41st Annual Symposium on Foundations of Computer Science, 2000, pp. 565–574 DOI: 10.1109/SFCS.2000.892324
  71. Ignacio Amores-Sesar, Christian Cachin and Enrico Tedeschi “When is Spring coming? A Security Analysis of Avalanche Consensus”, 2022 arXiv:2210.03423 [cs.DC]
  72. “A Simple Framework for Contrastive Learning of Visual Representations” In Proceedings of the 37th International Conference on Machine Learning 119, Proceedings of Machine Learning Research PMLR, 2020, pp. 1597–1607
  73. Mark A Kramer “Nonlinear principal component analysis using autoassociative neural networks” In AIChE journal 37.2 Wiley Online Library, 1991, pp. 233–243
  74. Janarthanan Rajendran, Alexander Irpan and Eric Jang “Meta-Learning Requires Meta-Augmentation” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 5705–5715
  75. Philip J. Boland “Majority systems and the Condorcet Jury Theorem” In Journal of the Royal Statistical Society Series D: The Statistician 38.3, 2018, pp. 181–189 DOI: 10.2307/2348873
  76. Mark Fey “A note on the Condorcet Jury Theorem with supermajority voting rules” In Social Choice and Welfare JSTOR, 2003, pp. 27–32
  77. Sebastian Caldas and Sai Meher Karthik Duddu and Peter Wu and Tian Li and Jakub Konečný and H. Brendan McMahan and Virginia Smith and Ameet Talwalkar “LEAF: A Benchmark for Federated Settings”, 2019 arXiv:1812.01097 [cs.LG]
  78. Michael J Fischer, Nancy A Lynch and Michael S Paterson “Impossibility of distributed consensus with one faulty process” In Journal of the ACM (JACM) 32.2 ACM New York, NY, USA, 1985, pp. 374–382
  79. Maksym Zavershynskyi, Medium.com “Exploring Liveness of Avalanche” Accessed: 2024-01-04, https://medium.com/@zaver.max/exploring-liveness-of-avalanche-d22f13b2db00
  80. Vasek Chvátal “The tail of the hypergeometric distribution” In Discrete Mathematics 25.3 Elsevier, 1979, pp. 285–287
  81. Wassily Hoeffding “Probability inequalities for sums of bounded random variables” In The collected works of Wassily Hoeffding Springer, 1994, pp. 409–426
  82. “Are qualified majority rules special?” In Public Choice 42.3 Springer, 1984, pp. 257–272
  83. “Federated Learning with non-IID data”, 2018 arXiv:1806.00582 [cs.LG]
  84. Yann LeCun “The MNIST database of handwritten digits”, http://yann.lecun.com/exdb/mnist/, 1998
  85. Gregory Cohen and Saeed Afshar and Jonathan Tapson and André van Schaik “EMNIST: an extension of MNIST to handwritten letters”, 2017 arXiv:1702.05373 [cs.CV]
  86. “Xgboost: extreme gradient boosting” In R package version 0.4-2 1.4, 2015, pp. 1–4
  87. “Lightgbm: A highly efficient gradient boosting decision tree” In Advances in neural information processing systems 30, 2017
  88. M. Pease, R. Shostak and L. Lamport “Reaching Agreement in the Presence of Faults” In J. ACM 27.2 New York, NY, USA: Association for Computing Machinery, 1980, pp. 228–234 DOI: 10.1145/322186.322188
  89. Ian Goodfellow, Yoshua Bengio and Aaron Courville “Deep Learning” http://www.deeplearningbook.org MIT Press, 2016
  90. “Predicting stroke from electronic health records” In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2019, pp. 5704–5707 DOI: 10.1109/EMBC.2019.8857234
  91. D Mazières “Stellar Consensus Protocol. Stellar”, 2021
  92. WY Tan “On the absorption probabilities and absorption times of finite homogeneous birth-death processes” In Biometrics JSTOR, 1976, pp. 745–752
  93. “The classification of birth and death processes” In Transactions of the American Mathematical Society 86.2, 1957, pp. 366–400

Summary

  • The paper introduces consensus learning by integrating decentralized ensemble techniques with consensus protocols like Slush to counter Byzantine faults.
  • The paper outlines a two-phase algorithm separating independent model training from consensus-driven prediction aggregation to ensure data privacy and scalability.
  • Theoretical analyses and simulations demonstrate that consensus learning maintains robust accuracy even with limited Byzantine adversaries versus traditional methods.

Introducing Consensus Learning: A Novel Paradigm for Decentralized Ensemble Learning

Overview

Recent developments in ML have illuminated the benefits and necessities of distributed computing paradigms, particularly in the context of processing vast data volumes across decentralized architectures. This exploration is motivated by the intricate nature of foundation models requiring substantial computational resources, underscoring the importance of scalability and efficiency in model training processes. Among the distributed computing methodologies, Federated Learning (FL) has emerged as a principal approach allowing collaborative model training while preserving data privacy.

However, the susceptibility of FL and other distributed algorithms to Byzantine faults—malicious or faulty behavior by participants—poses significant challenges. Despite advancements in aggregating local updates in a manner robust against such adversaries, ensuring privacy and resisting Byzantine attacks in a decentralized environment without a central server remains an ongoing concern.

Against this backdrop, the paradigm of consensus learning emerges. This approach marries classical ensemble methods with consensus protocols from peer-to-peer systems, offering a promising avenue for enhancing user data privacy, algorithm scalability, and Byzantine robustness. By focusing on binary classification tasks and leveraging the Slush consensus protocol, the paper presents theoretical analyses and numerical simulations highlighting the robustness and efficiency of consensus learning.

Consensus Learning Paradigm

Consensus learning distinguishes itself by its two-phased algorithm: the individual learning phase and the communication phase. The former allows participants to independently develop models on their data without sharing sensitive information. The subsequent phase involves participants sharing their model predictions on new data inputs, followed by a consensus-driven process to reach a collective decision. Crucially, this methodology inherits the privacy protections and Byzantine resilience from the implemented consensus protocol, addressing significant issues in current distributed ML approaches.

Theoretical Insights and Practical Implications

The theoretical foundation of consensus learning is meticulously laid out through a conceptual model tailored to binary classification tasks. The deployment of the Slush protocol provides an instructive case paper, revealing lower bounds on classifier accuracy and scenarios conducive to the algorithm's outperformance compared to traditional ensemble methods. The analyses underscore the paradigm's potential for diverse applications, from regression problems to unsupervised learning tasks, indicating its adaptability beyond binary classification.

Furthermore, the consideration of Byzantine behavior within the consensus learning framework illuminates the resilience of this approach. The protocol demonstrates robustness against a limited number of Byzantine participants, highlighting an essential advantage over centralized methods, particularly in environments where such risks cannot be entirely mitigated.

Future Directions

The exploration of consensus learning opens several avenues for future research and development. Expanding the applicability of the paradigm to encompass a wider range of ML tasks, including regression and unsupervised learning, poses an intriguing prospect. Moreover, the introduction of more sophisticated local aggregation rules and consensus protocols could further enhance the robustness and efficiency of the learning process.

A promising direction involves integrating consensus learning with blockchain technologies, leveraging immutable records for participant performance and deploying incentive mechanisms to promote honesty. This integration not only promises to heighten the security and efficiency of the consensus learning approach but also aligns with the broader trend towards decentralized computing solutions across various domains.

Conclusion

Consensus learning offers a groundbreaking approach to distributed machine learning, effectively addressing key challenges associated with data privacy, Byzantine faults, and scalability. By combining the strengths of ensemble learning with the robustness of consensus protocols, this paradigm presents a viable path forward for collaborative model training in a decentralized context. As research and experimentation in this field progress, consensus learning is poised to make significant contributions to the evolution of distributed machine learning methodologies.