Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Federated Prediction-Powered Inference from Decentralized Data (2409.01730v1)

Published 3 Sep 2024 in cs.LG

Abstract: In various domains, the increasing application of machine learning allows researchers to access inexpensive predictive data, which can be utilized as auxiliary data for statistical inference. Although such data are often unreliable compared to gold-standard datasets, Prediction-Powered Inference (PPI) has been proposed to ensure statistical validity despite the unreliability. However, the challenge of `data silos' arises when the private gold-standard datasets are non-shareable for model training, leading to less accurate predictive models and invalid inferences. In this paper, we introduces the Federated Prediction-Powered Inference (Fed-PPI) framework, which addresses this challenge by enabling decentralized experimental data to contribute to statistically valid conclusions without sharing private information. The Fed-PPI framework involves training local models on private data, aggregating them through Federated Learning (FL), and deriving confidence intervals using PPI computation. The proposed framework is evaluated through experiments, demonstrating its effectiveness in producing valid confidence intervals.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. M. Mirdita, K. Schütze, Y. Moriwaki, L. Heo, S. Ovchinnikov, and M. Steinegger, “Colabfold: making protein folding accessible to all,” Nature methods, vol. 19, no. 6, pp. 679–682, 2022.
  2. M. Reichstein, G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais, and F. Prabhat, “Deep learning and process understanding for data-driven earth system science,” Nature, vol. 566, no. 7743, pp. 195–204, 2019.
  3. A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma et al., “Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences,” Proceedings of the National Academy of Sciences, vol. 118, no. 15, p. e2016239118, 2021.
  4. K. Jaganathan, S. K. Panagiotopoulou, J. F. McRae, S. F. Darbandi, D. Knowles, Y. I. Li, J. A. Kosmicki, J. Arbelaez, W. Cui, G. B. Schwartz et al., “Predicting splicing from primary sequence with deep learning,” Cell, vol. 176, no. 3, pp. 535–548, 2019.
  5. C. Wu and R. R. Sitter, “A model-calibration approach to using complete auxiliary information from survey data,” Journal of the American Statistical Association, vol. 96, no. 453, pp. 185–193, 2001.
  6. F. J. Breidt and J. D. Opsomer, “Model-assisted survey estimation with modern prediction techniques,” Statistical science, vol. 32, no. 2, pp. 190–205, 2017.
  7. A. N. Angelopoulos, S. Bates, C. Fannjiang, M. I. Jordan, and T. Zrnic, “Prediction-powered inference,” Science, vol. 382, no. 6671, pp. 669–674, 2023.
  8. S. Leonelli, “Data—from objects to assets,” Nature, vol. 574, no. 7778, pp. 317–320, 2019.
  9. T. Miyakawa, “No raw data, no science: another possible source of the reproducibility crisis,” pp. 1–6, 2020.
  10. Q. Li, Y. Diao, Q. Chen, and B. He, “Federated learning on non-iid data silos: An experimental study,” in 2022 IEEE 38th international conference on data engineering (ICDE).   IEEE, 2022, pp. 965–978.
  11. J. Kim, H. Ha, B.-G. Chun, S. Yoon, and S. K. Cha, “Collaborative analytics for data silos,” in 2016 IEEE 32nd International Conference on Data Engineering (ICDE).   IEEE, 2016, pp. 743–754.
  12. M. F. Naeem, S. J. Oh, Y. Uh, Y. Choi, and J. Yoo, “Reliable fidelity and diversity metrics for generative models,” in International Conference on Machine Learning.   PMLR, 2020, pp. 7176–7185.
  13. C. F. Caiafa, J. Solé-Casals, P. Marti-Puig, S. Zhe, and T. Tanaka, “Decomposition methods for machine learning with small, incomplete or noisy datasets,” Applied Sciences, vol. 10, no. 23, p. 8481, 2020.
  14. G. Koppe, A. Meyer-Lindenberg, and D. Durstewitz, “Deep learning for small and big data in psychiatry,” Neuropsychopharmacology, vol. 46, no. 1, pp. 176–190, 2021.
  15. T. Nguyen, M. Dakka, S. Diakiw, M. VerMilyea, M. Perugini, J. Hall, and D. Perugini, “A novel decentralized federated learning approach to train on globally distributed, poor quality, and protected private medical data,” Scientific Reports, vol. 12, no. 1, p. 8888, 2022.
  16. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas, “Communication-efficient learning of deep networks from decentralized data,” in Artificial intelligence and statistics.   PMLR, 2017, pp. 1273–1282.
  17. C. Fan, J. Hu, and J. Huang, “Private semi-supervised federated learning.” in IJCAI, 2022, pp. 2009–2015.
  18. E. Diao, J. Ding, and V. Tarokh, “Semifl: Semi-supervised federated learning for unlabeled clients with alternate training,” Advances in Neural Information Processing Systems, vol. 35, pp. 17 871–17 884, 2022.
  19. X. Pei, X. Deng, S. Tian, L. Zhang, and K. Xue, “A knowledge transfer-based semi-supervised federated learning for iot malware detection,” IEEE Transactions on Dependable and Secure Computing, vol. 20, no. 3, pp. 2127–2143, 2022.
  20. T. Sun, D. Li, and B. Wang, “Decentralized federated averaging,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4289–4301, 2022.
  21. I. Dayan, H. R. Roth, A. Zhong, A. Harouni, A. Gentili, A. Z. Abidin, A. Liu, A. B. Costa, B. J. Wood, C.-S. Tsai et al., “Federated learning for predicting clinical outcomes in patients with covid-19,” Nature medicine, vol. 27, no. 10, pp. 1735–1743, 2021.
  22. M. J. Sheller, B. Edwards, G. A. Reina, J. Martin, S. Pati, A. Kotrotsou, M. Milchenko, W. Xu, D. Marcus, R. R. Colen et al., “Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data,” Scientific reports, vol. 10, no. 1, p. 12598, 2020.
  23. J. Xu, B. S. Glicksberg, C. Su, P. Walker, J. Bian, and F. Wang, “Federated learning for healthcare informatics,” Journal of healthcare informatics research, vol. 5, pp. 1–19, 2021.
  24. T. K. Dang, X. Lan, J. Weng, and M. Feng, “Federated learning for electronic health records,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 5, pp. 1–17, 2022.
  25. S. Banabilah, M. Aloqaily, E. Alsayed, N. Malik, and Y. Jararweh, “Federated learning review: Fundamentals, enabling technologies, and future applications,” Information processing & management, vol. 59, no. 6, p. 103061, 2022.
  26. A. Mey and M. Loog, “Improved generalization in semi-supervised learning: A survey of theoretical results,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4747–4767, 2022.
  27. D. Azriel, L. D. Brown, M. Sklar, R. Berk, A. Buja, and L. Zhao, “Semi-supervised linear regression,” Journal of the American Statistical Association, vol. 117, no. 540, pp. 2238–2251, 2022.
  28. S. Song, Y. Lin, and Y. Zhou, “A general m-estimation theory in semi-supervised framework,” Journal of the American Statistical Association, vol. 119, no. 546, pp. 1065–1075, 2024.
  29. Y. Zhang and J. Bradic, “High-dimensional semi-supervised learning: in search of optimal inference of the mean,” Biometrika, vol. 109, no. 2, pp. 387–403, 2022.
  30. S. Wang, T. H. McCormick, and J. T. Leek, “Methods for correcting inference based on outcomes predicted by machine learning,” Proceedings of the National Academy of Sciences of the United States of America, vol. 117, pp. 30 266 – 30 275, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:227067842
  31. J. Cheng, P. Luo, N. Xiong, and J. Wu, “Aafl: Asynchronous-adaptive federated learning in edge-based wireless communication systems for countering communicable infectious diseasess,” IEEE Journal on Selected Areas in Communications, vol. 40, no. 11, pp. 3172–3190, 2022.
  32. T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith, “Federated optimization in heterogeneous networks,” Proceedings of Machine learning and systems, vol. 2, pp. 429–450, 2020.
  33. Y. Huang, L. Chu, Z. Zhou, L. Wang, J. Liu, J. Pei, and Y. Zhang, “Personalized cross-silo federated learning on non-iid data,” in Proceedings of the AAAI conference on artificial intelligence, vol. 35, no. 9, 2021, pp. 7865–7873.
  34. J. Wang, Q. Liu, H. Liang, G. Joshi, and H. V. Poor, “Tackling the objective inconsistency problem in heterogeneous federated optimization,” Advances in neural information processing systems, vol. 33, pp. 7611–7623, 2020.
  35. S. P. Karimireddy, S. Kale, M. Mohri, S. Reddi, S. Stich, and A. T. Suresh, “Scaffold: Stochastic controlled averaging for federated learning,” in International conference on machine learning.   PMLR, 2020, pp. 5132–5143.
  36. A. M. Mood, “Introduction to the theory of statistics,” 1950.
  37. L. M. Iakoucheva, P. Radivojac, C. J. Brown, T. R. O’Connor, J. G. Sikes, Z. Obradovic, and A. K. Dunker, “The importance of intrinsic disorder for protein phosphorylation,” Nucleic acids research, vol. 32, no. 3, pp. 1037–1049, 2004.
  38. J. Jumper, R. Evans, A. Pritzel, T. Green, M. Figurnov, O. Ronneberger, K. Tunyasuvunakool, R. Bates, A. Žídek, A. Potapenko et al., “Highly accurate protein structure prediction with alphafold,” nature, vol. 596, no. 7873, pp. 583–589, 2021.
  39. C. X. Chen TandGuestrin, “A scalable tree boosting system,” in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 2016, pp. 785–94.

Summary

We haven't generated a summary for this paper yet.

Youtube Logo Streamline Icon: https://streamlinehq.com