Federated Bayesian Deep Learning: The Application of Statistical Aggregation Methods to Bayesian Models (2403.15263v2)
Abstract: Federated learning (FL) is an approach to training machine learning models that takes advantage of multiple distributed datasets while maintaining data privacy and reducing communication costs associated with sharing local datasets. Aggregation strategies have been developed to pool or fuse the weights and biases of distributed deterministic models; however, modern deterministic deep learning (DL) models are often poorly calibrated and lack the ability to communicate a measure of epistemic uncertainty in prediction, which is desirable for remote sensing platforms and safety-critical applications. Conversely, Bayesian DL models are often well calibrated and capable of quantifying and communicating a measure of epistemic uncertainty along with a competitive prediction accuracy. Unfortunately, because the weights and biases in Bayesian DL models are defined by a probability distribution, simple application of the aggregation methods associated with FL schemes for deterministic models is either impossible or results in sub-optimal performance. In this work, we use independent and identically distributed (IID) and non-IID partitions of the CIFAR-10 dataset and a fully variational ResNet-20 architecture to analyze six different aggregation strategies for Bayesian DL models. Additionally, we analyze the traditional federated averaging approach applied to an approximate Bayesian Monte Carlo dropout model as a lightweight alternative to more complex variational inference methods in FL. We show that aggregation strategy is a key hyperparameter in the design of a Bayesian FL system with downstream effects on accuracy, calibration, uncertainty quantification, training stability, and client compute requirements.
- Patricia Mabry, “Making sense of the data explosion: The promise of systems science,” American Journal of Preventive Medicine, vol. 40, pp. S159–61, May 2011.
- Paul Voigt and Axel von dem Bussche, The EU General Data Protection Regulation (GDPR): A Practical Guide, Springer Publishing Company, Incorporated, 1st edition, 2017.
- “Communication-efficient learning of deep networks from decentralized data,” in Proceedings of the 20th International Conference on Artificial Intelligence and Statistics. Apr. 2017, vol. 54 of Proceedings of Machine Learning Research, pp. 1273–1282, PMLR.
- “Federated learning: Strategies for improving communication efficiency,” arXiv preprint arXiv:1610.05492, 2016.
- “Multi-objective evolutionary federated learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 4, pp. 1310–1322, 2019.
- “Robust and communication-efficient federated learning from non-I.I.D. data,” IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 9, pp. 3400–3413, 2020.
- “Federated optimization in heterogeneous networks,” in Proceedings of Machine Learning and Systems, 2020, vol. 2, pp. 429–450.
- “Optimizing federated learning on device heterogeneity with a sampling strategy,” in 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), 2021, pp. 1–10.
- “A superquantile approach to federated learning with heterogeneous devices,” in 2021 55th Annual Conference on Information Sciences and Systems (CISS), 2021, pp. 1–6.
- “Federated learning with non-IID data,” arXiv preprint arXiv:1806.00582, 2018.
- “Personalized federated learning with Moreau envelopes,” in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 21394–21405.
- “Personalized federated learning via variational Bayesian inference,” in Proceedings of the 39th International Conference on Machine Learning. Jul. 2022, vol. 162 of Proceedings of Machine Learning Research, pp. 26293–26310, PMLR.
- “Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,” in Advances in Neural Information Processing Systems, 2020, vol. 33, pp. 3557–3568.
- “Practical secure aggregation for privacy-preserving machine learning,” in Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017, CCS ’17.
- “Differentially private federated learning: A client level perspective,” arXiv preprint arXiv:1712.07557, 2017.
- “Protection against reconstruction and its applications in private federated learning,” arXiv preprint arXiv:1812.00984, 2018.
- “RSA: Byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01, pp. 1544–1551, Jul. 2019.
- “On calibration of modern neural networks,” in Proceedings of the 34th International Conference on Machine Learning. Aug. 2017, vol. 70 of Proceedings of Machine Learning Research, pp. 1321–1330, PMLR.
- “Multi-sensor distributed estimation fusion using minimum distance sum,” in 17th International Conference on Information Fusion (FUSION), 2014, pp. 1–8.
- “Combining probability distributions: A critique and an annotated bibliography,” Statistical Science, vol. 1, no. 1, pp. 114 – 135, 1986.
- Theodore Hill, “Conflations of probability distributions,” Transactions of the American Mathematical Society, vol. 363, Aug. 2008.
- “Fusion of probability density functions,” Proceedings of the IEEE, vol. 110, pp. 404–453, 04 2022.
- “Distributed weight consolidation: A brain segmentation case study,” in Advances in Neural Information Processing Systems, 2018, vol. 31.
- “Flower: A friendly federated learning research framework,” arXiv preprint arXiv:2007.14390, 2020.
- Yarin Gal, Uncertainty in Deep Learning, Ph.D. thesis, Dept. of Engineering, University of Cambridge, Cambridge, England, 2016.
- “What uncertainties do we need in Bayesian deep learning for computer vision?,” in Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, Eds., 2017, vol. 30.
- “Variational inference: A review for statisticians,” Journal of the American Statistical Association, vol. 112, no. 518, pp. 859–877, 2017.
- Probabilistic Deep Learning With Python, Keras and TensorFlow Probability, chapter 8, Manning Publications Co., 2020.
- Christopher M Bishop, Pattern Recognition and Machine Learning, Springer, New York, NY, USA, 2006.
- “Stochastic variational inference,” Journal of Machine Learning Research, vol. 14, no. 4, pp. 1303–1347, 2013.
- “Flipout: Efficient pseudo-independent weight perturbations on mini-batches,” in International Conference on Learning Representations, 2018.
- “Variational dropout and the local reparameterization trick,” in Advances in Neural Information Processing Systems, 2015, vol. 28.
- “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” in Proceedings of The 33rd International Conference on Machine Learning. 20–22 Jun 2016, vol. 48 of Proceedings of Machine Learning Research, pp. 1050–1059, PMLR.
- “Weight uncertainty in neural networks,” in Proceedings of the 32nd International Conference on Machine Learning. Jul. 2015, vol. 37 of Proceedings of Machine Learning Research, pp. 1613–1622, PMLR.
- “Deep Bayesian active learning with image data,” in Proceedings of the 34th International Conference on Machine Learning. Aug. 2017, vol. 70 of Proceedings of Machine Learning Research, pp. 1183–1192, PMLR.
- “A systematic comparison of Bayesian deep learning robustness in diabetic retinopathy tasks,” 2019.
- Laurence A. F. Park and Simeon Simoff, “Using entropy as a measure of acceptance for multi-label classification,” in Advances in Intelligent Data Analysis XIV. 2015, pp. 217–228, Springer International Publishing.
- “Decomposition of uncertainty in Bayesian deep learning for efficient and risk-sensitive learning,” in Proceedings of the 35th International Conference on Machine Learning. Jul. 2018, vol. 80 of Proceedings of Machine Learning Research, pp. 1184–1193, PMLR.
- “Uncertainty quantification using Bayesian neural networks in classification: Application to biomedical image segmentation,” Computational Statistics & Data Analysis, vol. 142, 2020.
- “A Bayesian federated learning framework with online Laplace approximation,” IEEE Transactions on Pattern Analysis & Machine Intelligence, vol. 46, no. 01, pp. 1–16, Jan. 2024.
- “FedBE: Making Bayesian model ensemble applicable to federated learning,” in International Conference on Learning Representations, 2021.
- “Probabilistic predictions with federated learning,” Entropy, vol. 23, no. 1, Dec. 2020.
- “Bayesian federated learning via predictive distribution distillation,” 2022.
- “How to combine independent data sets for the same quantity,” Chaos: An Interdisciplinary Journal of Nonlinear Science, vol. 21, no. 3, Sep. 2011.
- “A sampling investigation of the efficiency of weighting inversely as the estimated variance,” Biometrics, vol. 9, no. 4, pp. 447–459, 1953.
- “Forecast uncertainty, disagreement, and the linear pool,” Journal of Applied Econometrics, vol. 37, no. 1, pp. 23–41, 2022.
- Michael P. Clements, “Are macroeconomic density forecasts informative?,” International Journal of Forecasting, vol. 34, no. 2, pp. 181–198, 2018.
- “Consensus and uncertainty in economic prediction,” Journal of Political Economy, vol. 95, no. 3, pp. 591–621, 1987.
- M. Stone, “The opinion pool,” The Annals of Mathematical Statistics, vol. 32, no. 4, pp. 1339 – 1342, 1961.
- Michael P. Clements, “Forecast uncertainty—ex ante and ex post: U.S. inflation and output growth,” Journal of Business & Economic Statistics, vol. 32, no. 2, pp. 206–216, 2014.
- “Measuring forecast uncertainty by disagreement: The missing link,” Journal of Applied Econometrics, vol. 25, Feb. 2008.
- “Allocating the weights in the linear opinion pool,” Journal of Forecasting, vol. 9, pp. 53–73, 1990.
- “Multiple-model adaptive estimation using a residual correlation Kalman filter bank,” IEEE Transactions on Aerospace and Electronic Systems, vol. 36, no. 2, pp. 393–406, 2000.
- “Interest rates and the subjective probability distribution of inflation forecasts,” Journal of Money, Credit and Banking, vol. 20, no. 2, pp. 233–248, 1988.
- “Variational continual learning,” in International Conference on Learning Representations, 2018.
- “Bayesian incremental learning for deep neural networks,” 2018.
- “Precision-weighted federated learning,” arXiv preprint arXiv:2107.09627, 2021.
- Alex Krizhevsky, “Learning multiple layers of features from tiny images,” M.s. thesis, University of Toronto, Toronto, Ontario, Canada, 2009.
- “Federated learning on non-IID data silos: An experimental study,” in 2022 IEEE 38th International Conference on Data Engineering (ICDE), 2022, pp. 965–978.
- “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.
- “Deep residual learning for image recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
- “TensorFlow: Large-scale machine learning on heterogeneous distributed systems,” 2015.
- “Active Bayesian deep learning with vector sensor for passive sonar sensing of the ocean,” IEEE Journal of Oceanic Engineering, vol. 48, no. 3, pp. 837–852, 2023.