Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
184 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Algorithms for Collaborative Machine Learning under Statistical Heterogeneity (2408.00050v1)

Published 31 Jul 2024 in stat.ML, cs.DC, and cs.LG

Abstract: Learning from distributed data without accessing them is undoubtedly a challenging and non-trivial task. Nevertheless, the necessity for distributed training of a statistical model has been increasing, due to the privacy concerns of local data owners and the cost in centralizing the massively distributed data. Federated learning (FL) is currently the de facto standard of training a machine learning model across heterogeneous data owners, without leaving the raw data out of local silos. Nevertheless, several challenges must be addressed in order for FL to be more practical in reality. Among these challenges, the statistical heterogeneity problem is the most significant and requires immediate attention. From the main objective of FL, three major factors can be considered as starting points -- \textit{parameter}, textit{mixing coefficient}, and \textit{local data distributions}. In alignment with the components, this dissertation is organized into three parts. In Chapter II, a novel personalization method, \texttt{SuPerFed}, inspired by the mode-connectivity is introduced. In Chapter III, an adaptive decision-making algorithm, \texttt{AAggFF}, is introduced for inducing uniform performance distributions in participating clients, which is realized by online convex optimization framework. Finally, in Chapter IV, a collaborative synthetic data generation method, \texttt{FedEvg}, is introduced, leveraging the flexibility and compositionality of an energy-based modeling approach. Taken together, all of these approaches provide practical solutions to mitigate the statistical heterogeneity problem in data-decentralized settings, paving the way for distributed systems and applications using collaborative machine learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (209)
  1. J. D. Abernethy, E. Hazan, and A. Rakhlin, “Competing in the dark: An efficient algorithm for bandit linear optimization,” 2009.
  2. D. A. E. Acar, Y. Zhao, R. Matas, M. Mattina, P. Whatmough, and V. Saligrama, “Federated learning based on dynamic regularization,” in International Conference on Learning Representations, 2020.
  3. A. Agarwal and E. Hazan, “New algorithms for repeated play and universal portfolio management,” Princeton University Technical Report TR-740-05, Tech. Rep., 2005.
  4. A. Agarwal, E. Hazan, S. Kale, and R. E. Schapire, “Algorithms for portfolio management based on the newton method,” in Proceedings of the 23rd international conference on Machine learning, 2006, pp. 9–16.
  5. S. K. Ainsworth, J. Hayase, and S. Srinivasa, “Git re-basin: Merging models modulo permutation symmetries,” arXiv preprint arXiv:2209.04836, 2022.
  6. M. G. Arivazhagan, V. Aggarwal, A. K. Singh, and S. Choudhary, “Federated learning with personalization layers,” arXiv preprint arXiv:1912.00818, 2019.
  7. P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, “The nonstochastic multiarmed bandit problem,” SIAM journal on computing, vol. 32, no. 1, pp. 48–77, 2002.
  8. S. Augenstein, H. B. McMahan, D. Ramage, S. Ramaswamy, P. Kairouz, M. Chen, R. Mathews et al., “Generative models for effective ml on private, decentralized datasets,” arXiv preprint arXiv:1911.06679, 2019.
  9. T. Awosika, R. M. Shukla, and B. Pranggono, “Transparency and privacy: the role of explainable ai and federated learning in financial fraud detection,” IEEE Access, 2024.
  10. B. O. Ayinde, T. Inanc, and J. M. Zurada, “Regularizing deep neural networks by enhancing diversity in feature extraction,” IEEE transactions on neural networks and learning systems, vol. 30, no. 9, pp. 2650–2661, 2019.
  11. H. Bang and J. M. Robins, “Doubly robust estimation in missing data and causal inference models,” Biometrics, vol. 61, no. 4, pp. 962–973, 2005.
  12. A. Beck and M. Teboulle, “Mirror descent and nonlinear projected subgradient methods for convex optimization,” Operations Research Letters, vol. 31, no. 3, pp. 167–175, 2003.
  13. A. Ben Abacha and D. Demner-Fushman, “A question-entailment approach to question answering,” BMC Bioinform., vol. 20, no. 1, pp. 511:1–511:23, 2019. [Online]. Available: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3119-4
  14. M. Ben-Or, S. Goldwasser, and A. Wigderson, “Completeness theorems for non-cryptographic fault-tolerant distributed computation,” in Proceedings of the Twentieth Annual ACM Symposium on Theory of Computing, ser. STOC ’88.     New York, NY, USA: Association for Computing Machinery, 1988, p. 1–10. [Online]. Available: https://doi.org/10.1145/62212.62213
  15. G. Benton, W. Maddox, S. Lotfi, and A. G. G. Wilson, “Loss surface simplexes for mode connecting volumes and fast ensembling,” in International Conference on Machine Learning.     PMLR, 2021, pp. 769–779.
  16. P. Berka, “Workshop notes on discovery challenge pkdd’99,” 1999. [Online]. Available: http://lisp.vse.cz/pkdd99/
  17. C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra, “Weight uncertainty in neural network,” in International Conference on Machine Learning.     PMLR, 2015, pp. 1613–1622.
  18. D. Caldarola, B. Caputo, and M. Ciccone, “Improving generalization in federated learning by seeking flat minima,” in European Conference on Computer Vision.     Springer, 2022, pp. 654–672.
  19. S. Caldas, S. M. K. Duddu, P. Wu, T. Li, J. Konečnỳ, H. B. McMahan, V. Smith, and A. Talwalkar, “Leaf: A benchmark for federated settings,” arXiv preprint arXiv:1812.01097, 2018.
  20. I. Char, Y. Chung, W. Neiswanger, K. Kandasamy, A. O. Nelson, M. Boyer, E. Kolemen, and J. Schneider, “Offline contextual bayesian optimization,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  21. P. Chatterjee, D. Das, and D. B. Rawat, “Federated learning empowered recommendation model for financial consumer services,” IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 2508–2516, 2024.
  22. H. Chen and H. Vikalo, “Federated learning in non-iid settings aided by differentially private synthetic data,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 5026–5035.
  23. H. Chen, T. Zhu, T. Zhang, W. Zhou, and P. S. Yu, “Privacy and fairness in federated learning: on the perspective of trade-off,” ACM Computing Surveys, 2023.
  24. R. Cheng and N. Amin, “Estimating parameters in continuous univariate distributions with a shifted origin,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 45, no. 3, pp. 394–403, 1983.
  25. J. H. Cheon, A. Kim, M. Kim, and Y. Song, “Homomorphic encryption for arithmetic of approximate numbers,” in Advances in Cryptology – ASIACRYPT 2017, T. Takagi and T. Peyrin, Eds.     Cham: Springer International Publishing, 2017, pp. 409–437.
  26. J. Chi, Y. Tian, G. J. Gordon, and H. Zhao, “Understanding and mitigating accuracy disparity in regression,” in International conference on machine learning.     PMLR, 2021, pp. 1866–1876.
  27. Y. Choi, A. Vergari, and G. Van den Broeck, “Probabilistic circuits: A unifying framework for tractable probabilistic models,” UCLA. URL: http://starai. cs. ucla. edu/papers/ProbCirc20. pdf, p. 6, 2020.
  28. A. Choromanska, M. Henaff, M. Mathieu, G. B. Arous, and Y. LeCun, “The loss surfaces of multilayer networks,” in Artificial intelligence and statistics.     PMLR, 2015, pp. 192–204.
  29. N. C. Codella, D. Gutman, M. E. Celebi, B. Helba, M. A. Marchetti, S. W. Dusza, A. Kalloo, K. Liopyris, N. Mishra, H. Kittler et al., “Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging (isbi), hosted by the international skin imaging collaboration (isic),” in 2018 IEEE 15th international symposium on biomedical imaging (ISBI 2018).     IEEE, 2018, pp. 168–172.
  30. L. Collins, H. Hassani, A. Mokhtari, and S. Shakkottai, “Exploiting shared representations for personalized federated learning,” in International Conference on Machine Learning.     PMLR, 2021, pp. 2089–2099.
  31. ——, “Exploiting shared representations for personalized federated learning,” in International conference on machine learning.     PMLR, 2021, pp. 2089–2099.
  32. A. Cotter, H. Jiang, M. Gupta, S. Wang, T. Narayan, S. You, and K. Sridharan, “Optimization with non-differentiable constraints with applications to fairness, recall, churn, and other goals,” Journal of Machine Learning Research, vol. 20, no. 172, pp. 1–59, 2019.
  33. T. M. Cover, “Universal portfolios,” Mathematical finance, vol. 1, no. 1, pp. 1–29, 1991.
  34. J. Cui and T. Han, “Learning energy-based model via dual-mcmc teaching,” Advances in Neural Information Processing Systems, vol. 36, pp. 28 861–28 872, 2023.
  35. D. Curry, “Android statistics (2023),” BusinessofApps, 2023. [Online]. Available: https://www.businessofapps.com/data/android-statistics/
  36. W. Dai, C. Dai, S. Qu, J. Li, and S. Das, “Very deep convolutional neural networks for raw waveforms,” in 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP).     IEEE, 2017, pp. 421–425.
  37. I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsai et al., “Federated learning for predicting clinical outcomes in patients with covid-19,” Nature medicine, vol. 27, no. 10, pp. 1735–1743, 2021.
  38. I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsai, C.-H. Wang, C.-N. Hsu, C. K. Lee, P. Ruan, D. Xu, D. Wu, E. Huang, F. C. Kitamura, G. Lacey, G. C. de Antônio Corradi, G. Nino, H.-H. Shin, H. Obinata, H. Ren, J. C. Crane, J. Tetreault, J. Guan, J. W. Garrett, J. D. Kaggie, J. G. Park, K. Dreyer, K. Juluru, K. Kersten, M. A. B. C. Rockenbach, M. G. Linguraru, M. A. Haider, M. AbdelMaseeh, N. Rieke, P. F. Damasceno, P. M. C. e Silva, P. Wang, S. Xu, S. Kawano, S. Sriswasdi, S. Y. Park, T. M. Grist, V. Buch, W. Jantarabenjakul, W. Wang, W. Y. Tak, X. Li, X. Lin, Y. J. Kwon, A. Quraini, A. Feng, A. N. Priest, B. Turkbey, B. Glicksberg, B. Bizzo, B. S. Kim, C. Tor-Díez, C.-C. Lee, C.-J. Hsu, C. Lin, C.-L. Lai, C. P. Hess, C. Compas, D. Bhatia, E. K. Oermann, E. Leibovitz, H. Sasaki, H. Mori, I. Yang, J. H. Sohn, K. N. K. Murthy, L.-C. Fu, M. R. F. de Mendonça, M. Fralick, M. K. Kang, M. Adil, N. Gangai, P. Vateekul, P. Elnajjar, S. Hickman, S. Majumdar, S. L. McLeod, S. Reed, S. Gräf, S. Harmon, T. Kodama, T. Puthanakit, T. Mazzulli, V. L. de Lavor, Y. Rakvongthai, Y. R. Lee, Y. Wen, F. J. Gilbert, M. G. Flores, and Q. Li, “Federated learning for predicting clinical outcomes in patients with covid-19,” Nature Medicine, vol. 27, no. 10, p. 1735–1743, Sep. 2021. [Online]. Available: http://dx.doi.org/10.1038/s41591-021-01506-3
  39. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE conference on computer vision and pattern recognition.     Ieee, 2009, pp. 248–255.
  40. L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 141–142, 2012.
  41. Y. Deng, M. M. Kamani, and M. Mahdavi, “Adaptive personalized federated learning,” arXiv preprint arXiv:2003.13461, 2020.
  42. ——, “Distributionally robust federated averaging,” Advances in neural information processing systems, vol. 33, pp. 15 111–15 122, 2020.
  43. C. T. Dinh, N. H. Tran, and T. D. Nguyen, “Personalized federated learning with moreau envelopes,” arXiv preprint arXiv:2006.08848, 2020.
  44. C. T. Dinh, T. T. Vu, N. H. Tran, M. N. Dao, and H. Zhang, “Fedu: A unified framework for federated multi-task learning with laplacian regularization,” arXiv preprint arXiv:2102.07148, 2021.
  45. F. Draxler, K. Veschgini, M. Salmhofer, and F. Hamprecht, “Essentially no barriers in neural network energy landscape,” in International conference on machine learning.     PMLR, 2018, pp. 1309–1318.
  46. Y. Du, C. Durkan, R. Strudel, J. B. Tenenbaum, S. Dieleman, R. Fergus, J. Sohl-Dickstein, A. Doucet, and W. S. Grathwohl, “Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc,” in International conference on machine learning.     PMLR, 2023, pp. 8489–8510.
  47. Y. Du, S. Li, and I. Mordatch, “Compositional visual generation with energy based models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6637–6647, 2020.
  48. Y. Du, S. Li, J. Tenenbaum, and I. Mordatch, “Improved contrastive divergence training of energy based models,” arXiv preprint arXiv:2012.01316, 2020.
  49. Y. Du and I. Mordatch, “Implicit generation and modeling with energy based models,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  50. C. Dwork, “Differential privacy,” in International colloquium on automata, languages, and programming.     Springer, 2006, pp. 1–12.
  51. S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning,” Neural networks, vol. 107, pp. 3–11, 2018.
  52. R. Entezari, H. Sedghi, O. Saukh, and B. Neyshabur, “The role of permutation invariance in linear mode connectivity of neural networks,” arXiv preprint arXiv:2110.06296, 2021.
  53. A. Fallah, A. Mokhtari, and A. Ozdaglar, “Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach,” Advances in Neural Information Processing Systems, vol. 33, pp. 3557–3568, 2020.
  54. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” arXiv preprint arXiv:2010.01412, 2020.
  55. S. Fort, G. K. Dziugaite, M. Paul, S. Kharaghani, D. M. Roy, and S. Ganguli, “Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel,” Advances in Neural Information Processing Systems, vol. 33, pp. 5850–5861, 2020.
  56. S. Fort, H. Hu, and B. Lakshminarayanan, “Deep ensembles: A loss landscape perspective,” arXiv preprint arXiv:1912.02757, 2019.
  57. S. Fort and S. Jastrzebski, “Large scale structure of neural network loss landscapes,” Advances in Neural Information Processing Systems, vol. 32, pp. 6709–6717, 2019.
  58. J. Frankle, “Revisiting" qualitatively characterizing neural network optimization problems",” arXiv preprint arXiv:2012.06898, 2020.
  59. J. Frankle, G. K. Dziugaite, D. Roy, and M. Carbin, “Linear mode connectivity and the lottery ticket hypothesis,” in International Conference on Machine Learning.     PMLR, 2020, pp. 3259–3269.
  60. M. Fréchet, “Sur la loi de probabilité de l’écart maximum,” Ann. de la Soc. Polonaise de Math., 1927.
  61. T. Garipov, P. Izmailov, D. Podoprikhin, D. Vetrov, and A. G. Wilson, “Loss surfaces, mode connectivity, and fast ensembling of dnns,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, pp. 8803–8812.
  62. A. Ghosh, J. Chung, D. Yin, and K. Ramchandran, “An efficient framework for clustered federated learning,” arXiv preprint arXiv:2006.04088, 2020.
  63. ——, “An efficient framework for clustered federated learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 19 586–19 597, 2020.
  64. Y. Gong, Y. Fang, and Y. Guo, “Private data analytics on biomedical sensing data via distributed computation,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 13, no. 3, pp. 431–444, 2016.
  65. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.
  66. W. Grathwohl, K.-C. Wang, J.-H. Jacobsen, D. Duvenaud, M. Norouzi, and K. Swersky, “Your classifier is secretly an energy based model and you should treat it like one,” arXiv preprint arXiv:1912.03263, 2019.
  67. L. Grenioux, É. Moulines, and M. Gabrié, “Balanced training of energy-based models with adaptive flow sampling,” arXiv preprint arXiv:2306.00684, 2023.
  68. N. Guha, A. Talwalkar, and V. Smith, “One-shot federated learning,” arXiv preprint arXiv:1902.11175, 2019.
  69. E. J. Gumbel, “Les valeurs extrêmes des distributions statistiques,” in Annales de l’institut Henri Poincaré, vol. 5, no. 2, 1935, pp. 115–158.
  70. C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, “On calibration of modern neural networks,” in International Conference on Machine Learning.     PMLR, 2017, pp. 1321–1330.
  71. S.-J. Hahn, M. Jeong, and J. Lee, “Connecting low-loss subspace for personalized federated learning,” in Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2022, pp. 505–515.
  72. B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” Advances in neural information processing systems, vol. 31, 2018.
  73. F. Hanzely and P. Richtárik, “Federated learning of a mixture of global and local models,” arXiv preprint arXiv:2002.05516, 2020.
  74. A. Hard, K. Rao, R. Mathews, S. Ramaswamy, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage, “Federated learning for mobile keyboard prediction,” arXiv preprint arXiv:1811.03604, 2018.
  75. A. S. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage, “Federated learning for mobile keyboard prediction,” ArXiv, vol. abs/1811.03604, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:53207681
  76. C. Hardy, E. Le Merrer, and B. Sericola, “Md-gan: Multi-discriminator generative adversarial networks for distributed datasets,” in 2019 IEEE international parallel and distributed processing symposium (IPDPS).     IEEE, 2019, pp. 866–877.
  77. E. Hazan, A. Agarwal, and S. Kale, “Logarithmic regret algorithms for online convex optimization,” Machine Learning, vol. 69, pp. 169–192, 2007.
  78. E. Hazan and S. Kale, “Extracting certainty from uncertainty: Regret bounded by variation in costs,” Machine learning, vol. 80, pp. 165–188, 2010.
  79. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  80. C. E. Heinbaugh, E. Luz-Ricca, and H. Shao, “Data-free one-shot federated learning under very high statistical heterogeneity,” in The Eleventh International Conference on Learning Representations, 2022.
  81. D. P. Helmbold, R. E. Schapire, Y. Singer, and M. K. Warmuth, “On-line portfolio selection using multiplicative updates,” Mathematical Finance, vol. 8, no. 4, pp. 325–347, 1998.
  82. D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” arXiv preprint arXiv:1606.08415, 2016.
  83. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans trained by a two time-scale update rule converge to a local nash equilibrium,” Advances in neural information processing systems, vol. 30, 2017.
  84. G. E. Hinton, “Training products of experts by minimizing contrastive divergence,” Neural computation, vol. 14, no. 8, pp. 1771–1800, 2002.
  85. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  86. X. Hou, J. Wang, C. Jiang, X. Zhang, Y. Ren, and M. Debbah, “Uav-enabled covert federated learning,” IEEE Transactions on Wireless Communications, vol. 22, no. 10, pp. 6793–6809, 2023.
  87. A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
  88. T.-M. H. Hsu, H. Qi, and M. Brown, “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.
  89. ——, “Measuring the effects of non-identical data distribution for federated visual classification,” arXiv preprint arXiv:1909.06335, 2019.
  90. S. Hu, J. Goetz, K. Malik, H. Zhan, Z. Liu, and Y. Liu, “Fedsynth: Gradient compression via synthetic data in federated learning,” arXiv preprint arXiv:2204.01273, 2022.
  91. Z. Hu, K. Shaloudegi, G. Zhang, and Y. Yu, “Federated learning meets multi-objective optimization,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 2022.
  92. ——, “Federated learning meets multi-objective optimization,” IEEE Transactions on Network Science and Engineering, vol. 9, no. 4, pp. 2039–2051, 2022.
  93. G. Huang, Y. Li, G. Pleiss, Z. Liu, J. E. Hopcroft, and K. Q. Weinberger, “Snapshot ensembles: Train 1, get m for free,” arXiv preprint arXiv:1704.00109, 2017.
  94. A. Imteaj and M. H. Amini, “Leveraging asynchronous federated learning to predict customers financial distress,” Intelligent Systems with Applications, vol. 14, p. 200064, 2022. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S2667305322000059
  95. P. Izmailov, W. J. Maddox, P. Kirichenko, T. Garipov, D. Vetrov, and A. G. Wilson, “Subspace inference for bayesian deep learning,” in Uncertainty in Artificial Intelligence.     PMLR, 2020, pp. 1169–1179.
  96. P. Izmailov, D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, “Averaging weights leads to wider optima and better generalization,” arXiv preprint arXiv:1803.05407, 2018.
  97. A. Janosi, W. Steinbrunn, M. Pfisterer, and R. Detrano, “Heart disease,” UCI Machine Learning Repository, 1988, DOI: https://doi.org/10.24432/C52P4X.
  98. D. Jhunjhunwala, S. Wang, and G. Joshi, “Fedexp: Speeding up federated averaging via extrapolation,” arXiv preprint arXiv:2301.09604, 2023.
  99. Y. Jiang, J. Konečnỳ, K. Rush, and S. Kannan, “Improving federated learning personalization via model agnostic meta learning,” arXiv preprint arXiv:1909.12488, 2019.
  100. P. Kairouz, H. B. McMahan, B. Avent, A. Bellet, M. Bennis, A. N. Bhagoji, K. Bonawitz, Z. Charles, G. Cormode, R. Cummings et al., “Advances and open problems in federated learning,” Foundations and Trends® in Machine Learning, vol. 14, no. 1–2, pp. 1–210, 2021.
  101. A. Kalai and S. Vempala, “Efficient algorithms for online decision problems,” Journal of Computer and System Sciences, vol. 71, no. 3, pp. 291–307, 2005.
  102. S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International conference on machine learning.     PMLR, 2020, pp. 5132–5143.
  103. S. P. Karimireddy, S. Kale, M. Mohri, S. J. Reddi, S. U. Stich, and A. T. Suresh, “SCAFFOLD: stochastic controlled averaging for on-device federated learning,” CoRR, vol. abs/1910.06378, 2019. [Online]. Available: http://arxiv.org/abs/1910.06378
  104. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
  105. A. Krizhevsky, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009.
  106. F. Kuppers, J. Kronenberger, A. Shantia, and A. Haselhoff, “Multivariate confidence calibration for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 326–327.
  107. S. Laine and T. Aila, “Temporal ensembling for semi-supervised learning,” arXiv preprint arXiv:1610.02242, 2016.
  108. B. Lakshminarayanan, A. Pritzel, and C. Blundell, “Simple and scalable predictive uncertainty estimation using deep ensembles,” arXiv preprint arXiv:1612.01474, 2016.
  109. Y. Le and X. Yang, “Tiny imagenet visual recognition challenge,” Standford University, Tech. Rep., 2015.
  110. Y. LeCun, S. Chopra, R. Hadsell, M. Ranzato, and F. Huang, “A tutorial on energy-based learning,” Predicting structured data, vol. 1, no. 0, 2006.
  111. D. Li and J. Wang, “Fedmd: Heterogenous federated learning via model distillation,” arXiv preprint arXiv:1910.03581, 2019.
  112. T. Li, A. Beirami, M. Sanjabi, and V. Smith, “Tilted empirical risk minimization,” arXiv preprint arXiv:2007.01162, 2020.
  113. T. Li, S. Hu, A. Beirami, and V. Smith, “Ditto: Fair and robust federated learning through personalization,” in Proceedings of the 38th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. Meila and T. Zhang, Eds., vol. 139.     PMLR, 18–24 Jul 2021, pp. 6357–6368. [Online]. Available: https://proceedings.mlr.press/v139/li21h.html
  114. T. Li, A. K. Sahu, A. Talwalkar, and V. Smith, “Federated learning: Challenges, methods, and future directions,” IEEE signal processing magazine, vol. 37, no. 3, pp. 50–60, 2020.
  115. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” arXiv preprint arXiv:1812.06127, 2018.
  116. T. Li, M. Sanjabi, A. Beirami, and V. Smith, “Fair resource allocation in federated learning,” arXiv preprint arXiv:1905.10497, 2019.
  117. ——, “Fair resource allocation in federated learning,” arXiv preprint arXiv:1905.10497, 2019.
  118. X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang, “On the convergence of fedavg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.
  119. ——, “On the convergence of fedavg on non-iid data,” arXiv preprint arXiv:1907.02189, 2019.
  120. X. Li, M. Jiang, X. Zhang, M. Kamp, and Q. Dou, “Fed{bn}: Federated learning on non-{iid} features via local batch normalization,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/pdf?id=6YEQUn0QICG
  121. Y. Li, X. Tao, X. Zhang, J. Liu, and J. Xu, “Privacy-preserved federated learning for autonomous driving,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 7, pp. 8423–8434, 2022.
  122. Z. Li, J. Shao, Y. Mao, J. H. Wang, and J. Zhang, “Federated learning with gan-based data synthesis for non-iid clients,” in International Workshop on Trustworthy Federated Learning.     Springer, 2022, pp. 17–32.
  123. P. P. Liang, T. Liu, L. Ziyin, N. B. Allen, R. P. Auerbach, D. Brent, R. Salakhutdinov, and L.-P. Morency, “Think locally, act globally: Federated learning with local and global representations,” arXiv preprint arXiv:2001.01523, 2020.
  124. C. Little, M. Elliot, and R. Allmendinger, “Federated learning for generating synthetic data: a scoping review,” International Journal of Population Data Science, vol. 8, no. 1, 2023.
  125. Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild,” in Proceedings of International Conference on Computer Vision (ICCV), December 2015.
  126. M. Luo, F. Chen, D. Hu, Y. Zhang, J. Liang, and J. Feng, “No fear of heterogeneity: Classifier calibration for federated learning with non-iid data,” Advances in Neural Information Processing Systems, vol. 34, pp. 5972–5984, 2021.
  127. X. Ma, J. Zhang, S. Guo, and W. Xu, “Layer-wised model aggregation for personalized federated learning,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 092–10 101.
  128. W. J. Maddox, P. Izmailov, T. Garipov, D. P. Vetrov, and A. G. Wilson, “A simple baseline for bayesian uncertainty in deep learning,” Advances in Neural Information Processing Systems, vol. 32, pp. 13 153–13 164, 2019.
  129. Y. Mansour, M. Mohri, J. Ro, and A. T. Suresh, “Three approaches for personalization with applications to federated learning,” arXiv preprint arXiv:2002.10619, 2020.
  130. O. Marfoq, G. Neglia, A. Bellet, L. Kameni, and R. Vidal, “Federated multi-task learning under a mixture of distributions,” arXiv preprint arXiv:2108.10252, 2021.
  131. P. S. Mashhadi, S. Nowaczyk, and S. Pashami, “Parallel orthogonal deep neural network,” Neural Networks, vol. 140, pp. 167–183, 2021.
  132. C. H. McCreery, N. Katariya, A. Kannan, M. Chablani, and X. Amatriain, “Effective transfer learning for identifying similar questions: matching user questions to covid-19 faqs,” in Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 3458–3465.
  133. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.     PMLR, 2017, pp. 1273–1282.
  134. H. B. McMahan, “A survey of algorithms and analysis for adaptive online learning,” The Journal of Machine Learning Research, vol. 18, no. 1, pp. 3117–3166, 2017.
  135. A. McMillan, “Differential privacy, property testing, and perturbations,” Ph.D. dissertation, 2018.
  136. T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” arXiv preprint arXiv:1802.05957, 2018.
  137. J. Mo and J. Walrand, “Fair end-to-end window-based congestion control,” IEEE/ACM Transactions on networking, vol. 8, no. 5, pp. 556–567, 2000.
  138. M. Mohri, G. Sivek, and A. T. Suresh, “Agnostic federated learning,” in International Conference on Machine Learning.     PMLR, 2019, pp. 4615–4625.
  139. ——, “Agnostic federated learning,” in International Conference on Machine Learning.     PMLR, 2019, pp. 4615–4625.
  140. A. S. Nemirovskij and D. B. Yudin, “Problem complexity and method efficiency in optimization,” 1983.
  141. D. T. Nguyen, C. K. Mummadi, T. P. N. Ngo, T. H. P. Nguyen, L. Beggel, and T. Brox, “Self: Learning to filter noisy labels with self-ensembling,” arXiv preprint arXiv:1910.01842, 2019.
  142. A. Q. Nichol and P. Dhariwal, “Improved denoising diffusion probabilistic models,” in International conference on machine learning.     PMLR, 2021, pp. 8162–8171.
  143. E. Nijkamp, M. Hill, S.-C. Zhu, and Y. N. Wu, “Learning non-convergent non-persistent short-run mcmc toward energy-based model,” Advances in Neural Information Processing Systems, vol. 32, 2019.
  144. J. Ogier du Terrail, S.-S. Ayed, E. Cyffers, F. Grimberg, C. He, R. Loeb, P. Mangold, T. Marchand, O. Marfoq, E. Mushtaq et al., “Flamby: Datasets and benchmarks for cross-silo federated learning in realistic healthcare settings,” Advances in Neural Information Processing Systems, vol. 35, pp. 5315–5334, 2022.
  145. J. Ogier du Terrail, A. Leopold, C. Joly, C. Béguier, M. Andreux, C. Maussion, B. Schmauch, E. W. Tramel, E. Bendjebbar, M. Zaslavskiy, G. Wainrib, M. Milder, J. Gervasoni, J. Guerin, T. Durand, A. Livartowski, K. Moutet, C. Gautier, I. Djafar, A.-L. Moisson, C. Marini, M. Galtier, F. Balazard, R. Dubois, J. Moreira, A. Simon, D. Drubay, M. Lacroix-Triki, C. Franchet, G. Bataillon, and P.-E. Heudel, “Federated learning for predicting histological response to neoadjuvant chemotherapy in triple-negative breast cancer,” Nature Medicine, vol. 29, no. 1, p. 135–146, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1038/s41591-022-02155-w
  146. F. Orabona, “A modern introduction to online learning,” arXiv preprint arXiv:1912.13213, 2019.
  147. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” Advances in neural information processing systems, vol. 32, 2019.
  148. S. Pati, U. Baid, B. Edwards, M. Sheller, S.-H. Wang, G. A. Reina, P. Foley, A. Gruzdev, D. Karkada, C. Davatzikos, C. Sako, S. Ghodasara, M. Bilello, S. Mohan, P. Vollmuth, G. Brugnara, C. J. Preetha, F. Sahm, K. Maier-Hein, M. Zenk, M. Bendszus, W. Wick, E. Calabrese, J. Rudie, J. Villanueva-Meyer, S. Cha, M. Ingalhalikar, M. Jadhav, U. Pandey, J. Saini, J. Garrett, M. Larson, R. Jeraj, S. Currie, R. Frood, K. Fatania, R. Y. Huang, K. Chang, C. Balaña, J. Capellades, J. Puig, J. Trenkler, J. Pichler, G. Necker, A. Haunschmidt, S. Meckel, G. Shukla, S. Liem, G. S. Alexander, J. Lombardo, J. D. Palmer, A. E. Flanders, A. P. Dicker, H. I. Sair, C. K. Jones, A. Venkataraman, M. Jiang, T. Y. So, C. Chen, P. A. Heng, Q. Dou, M. Kozubek, F. Lux, J. Michálek, P. Matula, M. Keřkovský, T. Kopřivová, M. Dostál, V. Vybíhal, M. A. Vogelbaum, J. R. Mitchell, J. Farinhas, J. A. Maldjian, C. G. B. Yogananda, M. C. Pinho, D. Reddy, J. Holcomb, B. C. Wagner, B. M. Ellingson, T. F. Cloughesy, C. Raymond, T. Oughourlian, A. Hagiwara, C. Wang, M.-S. To, S. Bhardwaj, C. Chong, M. Agzarian, A. X. Falcão, S. B. Martins, B. C. A. Teixeira, F. Sprenger, D. Menotti, D. R. Lucio, P. LaMontagne, D. Marcus, B. Wiestler, F. Kofler, I. Ezhov, M. Metz, R. Jain, M. Lee, Y. W. Lui, R. McKinley, J. Slotboom, P. Radojewski, R. Meier, R. Wiest, D. Murcia, E. Fu, R. Haas, J. Thompson, D. R. Ormond, C. Badve, A. E. Sloan, V. Vadmal, K. Waite, R. R. Colen, L. Pei, M. Ak, A. Srinivasan, J. R. Bapuraj, A. Rao, N. Wang, O. Yoshiaki, T. Moritani, S. Turk, J. Lee, S. Prabhudesai, F. Morón, J. Mandel, K. Kamnitsas, B. Glocker, L. V. M. Dixon, M. Williams, P. Zampakis, V. Panagiotopoulos, P. Tsiganos, S. Alexiou, I. Haliassos, E. I. Zacharaki, K. Moustakas, C. Kalogeropoulou, D. M. Kardamakis, Y. S. Choi, S.-K. Lee, J. H. Chang, S. S. Ahn, B. Luo, L. Poisson, N. Wen, P. Tiwari, R. Verma, R. Bareja, I. Yadav, J. Chen, N. Kumar, M. Smits, S. R. van der Voort, A. Alafandi, F. Incekara, M. M. J. Wijnenga, G. Kapsas, R. Gahrmann, J. W. Schouten, H. J. Dubbink, A. J. P. E. Vincent, M. J. van den Bent, P. J. French, S. Klein, Y. Yuan, S. Sharma, T.-C. Tseng, S. Adabi, S. P. Niclou, O. Keunen, A.-C. Hau, M. Vallières, D. Fortin, M. Lepage, B. Landman, K. Ramadass, K. Xu, S. Chotai, L. B. Chambless, A. Mistry, R. C. Thompson, Y. Gusev, K. Bhuvaneshwar, A. Sayah, C. Bencheqroun, A. Belouali, S. Madhavan, T. C. Booth, A. Chelliah, M. Modat, H. Shuaib, C. Dragos, A. Abayazeed, K. Kolodziej, M. Hill, A. Abbassy, S. Gamal, M. Mekhaimar, M. Qayati, M. Reyes, J. E. Park, J. Yun, H. S. Kim, A. Mahajan, M. Muzi, S. Benson, R. G. H. Beets-Tan, J. Teuwen, A. Herrera-Trujillo, M. Trujillo, W. Escobar, A. Abello, J. Bernal, J. Gómez, J. Choi, S. Baek, Y. Kim, H. Ismael, B. Allen, J. M. Buatti, A. Kotrotsou, H. Li, T. Weiss, M. Weller, A. Bink, B. Pouymayou, H. F. Shaykh, J. Saltz, P. Prasanna, S. Shrestha, K. M. Mani, D. Payne, T. Kurc, E. Pelaez, H. Franco-Maldonado, F. Loayza, S. Quevedo, P. Guevara, E. Torche, C. Mendoza, F. Vera, E. Ríos, E. López, S. A. Velastin, G. Ogbole, M. Soneye, D. Oyekunle, O. Odafe-Oyibotha, B. Osobu, M. Shu’aibu, A. Dorcas, F. Dako, A. L. Simpson, M. Hamghalam, J. J. Peoples, R. Hu, A. Tran, D. Cutler, F. Y. Moraes, M. A. Boss, J. Gimpel, D. K. Veettil, K. Schmidt, B. Bialecki, S. Marella, C. Price, L. Cimino, C. Apgar, P. Shah, B. Menze, J. S. Barnholtz-Sloan, J. Martin, and S. Bakas, “Federated learning enables big data for rare cancer boundary detection,” Nature Communications, vol. 13, no. 1, Dec. 2022. [Online]. Available: http://dx.doi.org/10.1038/s41467-022-33407-5
  149. K. Pillutla, S. M. Kakade, and Z. Harchaoui, “Robust aggregation for federated learning,” arXiv preprint arXiv:1912.13445, 2019.
  150. T. Qi, F. Wu, C. Wu, L. He, Y. Huang, and X. Xie, “Differentially private knowledge transfer for federated learning,” Nature Communications, vol. 14, no. 1, Jun. 2023. [Online]. Available: http://dx.doi.org/10.1038/s41467-023-38794-x
  151. M. Raginsky, A. Rakhlin, and M. Telgarsky, “Non-convex learning via stochastic gradient langevin dynamics: a nonasymptotic analysis,” in Conference on Learning Theory.     PMLR, 2017, pp. 1674–1703.
  152. P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv preprint arXiv:1710.05941, 2017.
  153. M. Rasouli, T. Sun, and R. Rajagopal, “Fedgan: Federated generative adversarial networks for distributed data,” arXiv preprint arXiv:2006.07228, 2020.
  154. S. Reddi, Z. Charles, M. Zaheer, Z. Garrett, K. Rush, J. Konečnỳ, S. Kumar, and H. B. McMahan, “Adaptive federated optimization,” arXiv preprint arXiv:2003.00295, 2020.
  155. ——, “Adaptive federated optimization,” arXiv preprint arXiv:2003.00295, 2020.
  156. R. L. Rivest and M. L. Dertouzos, “On data banks and privacy homomorphisms,” 1978. [Online]. Available: https://api.semanticscholar.org/CorpusID:6905087
  157. J. M. Robins, A. Rotnitzky, and L. P. Zhao, “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American statistical Association, pp. 846–866, 1994.
  158. P. Rusnock and A. Kerr-Lawson, “Bolzano and uniform continuity,” Historia Mathematica, vol. 32, no. 3, pp. 303–311, 2005.
  159. T. Salimans and J. Ho, “Should ebms model the energy or the score?” in Energy Based Models Workshop-ICLR 2021, 2021.
  160. V. Sanh, L. Debut, J. Chaumond, and T. Wolf, “Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter,” ArXiv, vol. abs/1910.01108, 2019.
  161. F. Sattler, K.-R. Müller, and W. Samek, “Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints,” IEEE transactions on neural networks and learning systems, 2020.
  162. ——, “Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints,” IEEE transactions on neural networks and learning systems, vol. 32, no. 8, pp. 3710–3722, 2020.
  163. S. Shalev-Shwartz et al., “Online learning and online convex optimization,” Foundations and Trends® in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
  164. S. Shalev-Shwartz and Y. Singer, “Online learning meets optimization in the dual,” in International Conference on Computational Learning Theory.     Springer, 2006, pp. 423–437.
  165. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  166. V. Smith, C.-K. Chiang, M. Sanjabi, and A. Talwalkar, “Federated multi-task learning,” arXiv preprint arXiv:1705.10467, 2017.
  167. K. Sohn, H. Lee, and X. Yan, “Learning structured output representation using deep conditional generative models,” Advances in neural information processing systems, vol. 28, 2015.
  168. Y. Song and S. Ermon, “Generative modeling by estimating gradients of the data distribution,” Advances in neural information processing systems, vol. 32, 2019.
  169. ——, “Improved techniques for training score-based generative models,” Advances in neural information processing systems, vol. 33, pp. 12 438–12 448, 2020.
  170. Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole, “Score-based generative modeling through stochastic differential equations,” arXiv preprint arXiv:2011.13456, 2020.
  171. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The journal of machine learning research, vol. 15, no. 1, pp. 1929–1958, 2014.
  172. G. Stoica, D. Bolya, J. Bjorner, P. Ramesh, T. Hearn, and J. Hoffman, “Zipit! merging models from different tasks without training,” arXiv preprint arXiv:2305.03053, 2023.
  173. M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning.     PMLR, 2019, pp. 6105–6114.
  174. Y. Tan, G. Long, L. Liu, T. Zhou, Q. Lu, J. Jiang, and C. Zhang, “Fedproto: Federated prototype learning across heterogeneous clients,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 8, 2022, pp. 8432–8440.
  175. Z. Tang, Y. Zhang, S. Shi, X. He, B. Han, and X. Chu, “Virtual homogeneity learning: Defending against data heterogeneity in federated learning,” in International Conference on Machine Learning.     PMLR, 2022, pp. 21 111–21 132.
  176. T. Tieleman, “Training restricted boltzmann machines using approximations to the likelihood gradient,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 1064–1071.
  177. B. Van Rooyen, A. Menon, and R. C. Williamson, “Learning with symmetric label noise: The importance of being unhinged,” Advances in neural information processing systems, vol. 28, 2015.
  178. V. Vapnik, “Principles of risk minimization for learning theory,” Advances in neural information processing systems, vol. 4, 1991.
  179. V. Vapnik and A. Y. Chervonenkis, “Teoriya raspoznavaniya obrazov.[theory of image recognition],” M.: Nauka, pp. 16–17, 1974.
  180. A. Vergari, Y. Choi, A. Liu, S. Teso, and G. Van den Broeck, “A compositional atlas of tractable circuit operations for probabilistic inference,” Advances in Neural Information Processing Systems, vol. 34, pp. 13 189–13 201, 2021.
  181. H. Wang, M. Yurochkin, Y. Sun, D. Papailiopoulos, and Y. Khazaeni, “Federated learning with matched averaging,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=BkluqlSFDS
  182. ——, “Federated learning with matched averaging,” arXiv preprint arXiv:2002.06440, 2020.
  183. J. Wang, Z. Charles, Z. Xu, G. Joshi, H. B. McMahan, M. Al-Shedivat, G. Andrew, S. Avestimehr, K. Daly, D. Data et al., “A field guide to federated optimization,” arXiv preprint arXiv:2107.06917, 2021.
  184. T. Wang, J.-Y. Zhu, A. Torralba, and A. A. Efros, “Dataset distillation,” arXiv preprint arXiv:1811.10959, 2018.
  185. Y. Wang, H. Fu, R. Kanagavelu, Q. Wei, Y. Liu, and R. S. M. Goh, “An aggregation-free federated learning for tackling data heterogeneity,” arXiv preprint arXiv:2404.18962, 2024.
  186. P. Warden, “Speech commands: A dataset for limited-vocabulary speech recognition,” arXiv preprint arXiv:1804.03209, 2018.
  187. M. K. Warmuth, A. K. Jagota et al., “Continuous and discrete-time nonlinear gradient descent: Relative loss bounds and convergence,” in Electronic proceedings of the 5th International Symposium on Artificial Intelligence and Mathematics, vol. 326.     Citeseer, 1997.
  188. W. Weibull, “A statistical distribution function of wide applicability,” Journal of applied mechanics, 1951.
  189. M. Welling and Y. W. Teh, “Bayesian learning via stochastic gradient langevin dynamics,” in Proceedings of the 28th international conference on machine learning (ICML-11).     Citeseer, 2011, pp. 681–688.
  190. M. Wortsman, M. Horton, C. Guestrin, A. Farhadi, and M. Rastegari, “Learning neural network subspaces,” arXiv preprint arXiv:2102.10472, 2021.
  191. Y. Wu, S. Zhang, W. Yu, Y. Liu, Q. Gu, D. Zhou, H. Chen, and W. Cheng, “Personalized federated learning under mixture of distributions,” in International Conference on Machine Learning.     PMLR, 2023, pp. 37 860–37 879.
  192. J. Xie, Y. Zhu, J. Li, and P. Li, “A tale of two flows: Cooperative learning of langevin flow and normalizing flow toward energy-based model,” arXiv preprint arXiv:2205.06924, 2022.
  193. B. Xin, W. Yang, Y. Geng, S. Chen, S. Wang, and L. Huang, “Private fl-gan: Differential privacy synthetic data generation based on federated learning,” in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).     IEEE, 2020, pp. 2927–2931.
  194. Y. Xiong, R. Wang, M. Cheng, F. Yu, and C.-J. Hsieh, “Feddm: Iterative distribution matching for communication-efficient federated learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16 323–16 332.
  195. P. Xu, J. Chen, D. Zou, and Q. Gu, “Global convergence of langevin dynamics based algorithms for nonconvex optimization,” Advances in Neural Information Processing Systems, vol. 31, 2018.
  196. J. Yang, R. Shi, D. Wei, Z. Liu, L. Zhao, B. Ke, H. Pfister, and B. Ni, “Medmnist v2-a large-scale lightweight benchmark for 2d and 3d biomedical image classification,” Scientific Data, vol. 10, no. 1, p. 41, 2023.
  197. Z. Yang, Y. Zhang, Y. Zheng, X. Tian, H. Peng, T. Liu, and B. Han, “Fedfed: Feature distillation against data heterogeneity in federated learning,” Advances in Neural Information Processing Systems, vol. 36, 2024.
  198. M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, and Y. Khazaeni, “Bayesian nonparametric federated learning of neural networks,” in Proceedings of the 36th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, K. Chaudhuri and R. Salakhutdinov, Eds., vol. 97.     Long Beach, California, USA: PMLR, 09–15 Jun 2019, pp. 7252–7261. [Online]. Available: http://proceedings.mlr.press/v97/yurochkin19a.html
  199. ——, “Bayesian nonparametric federated learning of neural networks,” in International conference on machine learning.     PMLR, 2019, pp. 7252–7261.
  200. M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi, “Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment,” in Proceedings of the 26th international conference on world wide web, 2017, pp. 1171–1180.
  201. G. Zhang, S. Malekmohammadi, X. Chen, and Y. Yu, “Proportional fairness in federated learning,” arXiv preprint arXiv:2202.01666, 2022.
  202. J. Zhang, C. Chen, B. Li, L. Lyu, S. Wu, S. Ding, C. Shen, and C. Wu, “Dense: Data-free one-shot federated learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 21 414–21 428, 2022.
  203. B. Zhao, K. R. Mopuri, and H. Bilen, “Dataset condensation with gradient matching,” arXiv preprint arXiv:2006.05929, 2020.
  204. H. Zhao, A. Coston, T. Adel, and G. J. Gordon, “Conditional learning of fair representations,” arXiv preprint arXiv:1910.07162, 2019.
  205. Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra, “Federated learning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018.
  206. ——, “Federated learning with non-iid data,” arXiv preprint arXiv:1806.00582, 2018.
  207. Y. Zhu, J. Xie, Y. Wu, and R. Gao, “Learning energy-based models by cooperative diffusion recovery likelihood,” arXiv preprint arXiv:2309.05153, 2023.
  208. Z. Zhu, J. Hong, and J. Zhou, “Data-free knowledge distillation for heterogeneous federated learning,” in International conference on machine learning.     PMLR, 2021, pp. 12 878–12 889.
  209. M. Zinkevich, “Online convex programming and generalized infinitesimal gradient ascent,” in Proceedings of the 20th international conference on machine learning (icml-03), 2003, pp. 928–936.

Summary

  • The paper introduces SuPerFed, AAggFF, and FedEvg to tackle statistical heterogeneity by enhancing model personalization, adaptive aggregation, and synthetic data generation.
  • It employs orthogonality regularization, online convex optimization, and energy-based models for improved calibration, client fairness, and efficient communication.
  • Empirical results demonstrate enhanced accuracy, sublinear regret bounds, and superior synthetic data quality in non-IID federated learning environments.

Overview of "Algorithms for Collaborative Machine Learning under Statistical Heterogeneity"

Introduction

The paper "Algorithms for Collaborative Machine Learning under Statistical Heterogeneity" by Seok-Ju Hahn focuses on perspectives for improving performance in federated learning (FL) under the constraint of data heterogeneity. The primary objective of FL is to enable collaborative training of a machine learning model across multiple clients without sharing raw data, thus preserving privacy. Despite its advantages, FL encounters significant challenges due to the inherent statistical heterogeneity across clients. This paper investigates three perspectives—model parameters, mixing coefficients, and local data distributions—for addressing statistical heterogeneity.

Parameter Perspective: SuPerFed

Chapter~\ref{ch:superfed} introduces SuPerFed, which aims to mitigate statistical heterogeneity through model mixture-based personalization. By leveraging mode connectivity, SuPerFed establishes an explicit synergy between global and local models to enhance personalization performance while maintaining good model calibration and robustness to label noise.

Key Contributions:

  • SuPerFed employs orthogonality regularization to diversify the knowledge captured by local and federated models.
  • The method yields notable improvements in personalization performance across various datasets and non-IID settings.
  • One significant aspect of SuPerFed is its robustness to label noise and enhanced calibration performance.

Results:

SuPerFed demonstrates superior accuracy in various statistical heterogeneity scenarios and ensures consistent performance regardless of the degree of heterogeneity.

Mixing Coefficient Perspective: AAggFF

Chapter~\ref{ch:aaggff} proposes AAggFF, an adaptive aggregation framework for FL designed to achieve client-level fairness by updating mixing coefficients dynamically. This framework unifies existing fair FL strategies under an online convex optimization (OCO) framework, addressing the problem of sample deficiency in the central server's decision-making process.

Key Contributions:

  • AAggFF-S and AAggFF-D are tailored for cross-silo and cross-device FL settings, respectively.
  • AAggFF-S uses the Online Newton Step algorithm, achieving optimal regret bounds with logarithmic dependence on the number of rounds.
  • AAggFF-D employs a linear-runtime FTRL algorithm, which is computationally efficient for large-scale FL settings.
  • The theoretical analysis guarantees sublinear regret bounds.

Results:

AAggFF consistently improves worst-case client performance and reduces performance disparity across clients, thus enhancing client-level fairness.

Local Data Distribution Perspective: FedEvg

Chapter~\ref{ch:fedevg} presents FedEvg, a method for federated synthetic data generation that leverages energy-based models (EBMs). FedEvg synthesizes data by aggregating signals from clients and refining synthetic data iteratively, improving training efficiency and reducing communication overhead.

Key Contributions:

  • FedEvg initializes synthetic data on the server and refines it with client signals based on EBMs.
  • The method avoids the need for explicit model parameter exchange, lowering communication costs.
  • By utilizing server-side SGLD steps, FedEvg ensures that synthetic data approximates the underlying local data distributions.

Results:

FedEvg produces high-quality synthetic data that serve as a proxy for local data distributions, evidenced by improved FID scores and discriminative performance of classifiers trained on the synthetic data.

Discussion

The paper highlights several promising directions for future work:

  • Extending SuPerFed to cross-device settings and combining online convex optimization with stochastic optimization.
  • Enhancing the stability of EBM training in FedEvg with advanced techniques like MCMC teaching and score-based modeling.
  • Empirical evaluation of the proposed methods on text and tabular data, alongside explicit privacy-preserving mechanisms such as differential privacy.

Conclusion

The dissertation presents innovative approaches to address statistical heterogeneity in FL from three different angles. By improving model personalization, adaptive aggregation, and synthetic data generation, it paves the way for more practical and scalable FL systems. These contributions are expected to significantly enhance the effectiveness of collaborative machine learning across data-decentralized environments.