Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
144 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Incentives in Private Collaborative Machine Learning (2404.01676v1)

Published 2 Apr 2024 in cs.LG

Abstract: Collaborative machine learning involves training models on data from multiple parties but must incentivize their participation. Existing data valuation methods fairly value and reward each party based on shared data or model parameters but neglect the privacy risks involved. To address this, we introduce differential privacy (DP) as an incentive. Each party can select its required DP guarantee and perturb its sufficient statistic (SS) accordingly. The mediator values the perturbed SS by the Bayesian surprise it elicits about the model parameters. As our valuation function enforces a privacy-valuation trade-off, parties are deterred from selecting excessive DP guarantees that reduce the utility of the grand coalition's model. Finally, the mediator rewards each party with different posterior samples of the model parameters. Such rewards still satisfy existing incentives like fairness but additionally preserve DP and a high similarity to the grand coalition's posterior. We empirically demonstrate the effectiveness and practicality of our approach on synthetic and real-world datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. Deep learning with differential privacy. In Proc. ACM CCS, pages 308–318, 2016.
  2. B. Balle and Y.-X. Wang. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In Proc. ICML, pages 394–403, 2018.
  3. G. Bernstein and D. Sheldon. Differentially private Bayesian inference for exponential families. In Proc. NeurIPS, pages 2924–2934, 2018.
  4. G. Bernstein and D. Sheldon. Differentially private Bayesian linear regression. In Proc. NeurIPS, pages 525–535, 2019.
  5. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
  6. Center for Open Data Enterprise. Sharing and utilizing health data for AI applications. Roundtable report, 2019.
  7. Truthful data acquisition via peer prediction. In Proc. NeurIPS, 2020.
  8. Featurized density ratio estimation. In Proc. UAI, pages 172–182, 2021.
  9. J. Conway. Artificial Intelligence and Machine Learning: Current Applications in Real Estate. PhD thesis, Massachusetts Institute of Technology, 2018.
  10. T. Cover and J. Thomas. Elements of Information Theory. John Wiley & Sons, Inc., 1991.
  11. Calibrating noise to sensitivity in private data analysis. In Proc. TCC, pages 265–284, 2006.
  12. C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407, 2014.
  13. Optimal and differentially private data acquisition: Central and local mechanisms. arXiv:2201.03968, 2022.
  14. Neither private nor fair: Impact of data imbalance on utility and fairness in differential privacy. In Proc. ACM CCS Workshop on Privacy-Preserving Machine Learning in Practice, pages 15–19, 2020.
  15. On the theory and practice of privacy-preserving Bayesian data analysis. In Proc. UAI, pages 192–201, 2016.
  16. Rényi differential privacy mechanisms for posterior sampling. In Proc. NIPS, pages 5289–5298, 2017.
  17. A distributional framework for data valuation. In Proc. ICML, pages 3535–3544, 2020.
  18. A. Ghorbani and J. Zou. Data Shapley: Equitable valuation of data for machine learning. In Proc. ICML, pages 2242–2251, 2019.
  19. M. D. Hoffman and A. Gelman. The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. JMLR, 15(1):1593–1623, 2014.
  20. R. Hu and Y. Gong. Trading data for learning: Incentive mechanism for on-device federated learning. In Proc. GLOBECOM, pages 1–6, 2020.
  21. PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference. In Proc. NIPS, pages 611––3621, 2017.
  22. A demonstration of Sterling: A privacy-preserving data marketplace. Proc. VLDB Endowment, 11(12):2086–2089, 2018.
  23. L. Itti and P. Baldi. Bayesian surprise attracts human attention. Vision Research, 49(10):1295–1306, 2009.
  24. Efficient task-specific data valuation for nearest neighbor algorithms. Proc. VLDB Endowment, 12(11):1610–1623, 2019.
  25. Towards efficient data valuation based on the Shapley value. In Proc. AISTATS, pages 1167–1176, 2019.
  26. Locally differentially private Bayesian inference. arXiv:2110.14426, 2021.
  27. Differentially private Bayesian inference for generalized linear models. In Proc. ICML, pages 5838–5849, 2021.
  28. Dealer: an end-to-end model marketplace with differential privacy. Proc. VLDB Endowment, 14(6):957–969, 2021.
  29. Y. Liu and J. Wei. Incentives for federated learning: A hypothesis elicitation approach. arXiv:2007.10596, 2020.
  30. O. Lobel. The problem with too much data privacy. Time Magazine, 27 Oct 2022.
  31. Data-free evaluation of user contributions in federated learning. In Proc. IEEE WiOpt, 2021.
  32. How to democratise and protect AI: Fair and differentially private decentralised deep learning. IEEE Transactions on Dependable and Secure Computing, 19(2):1003–1017, 2022.
  33. M. Maschler and B. Peleg. A characterization, existence proof and dimension bounds for the kernel of a game. Pacific J. Mathematics, 18(2):289–328, 1966.
  34. Communication-efficient learning of deep networks from decentralized data. In Proc. AISTATS, pages 1273–1282, 2017.
  35. B. McMahan and A. Thakurta. Federated learning with formal differential privacy guarantees. Google AI Blog, Google Research, Feb 2022.
  36. Learning differentially private recurrent language models. In Proc. ICLR, 2018.
  37. Robust Bayesian inference via coarsening. Journal of the American Statistical Association, 2018.
  38. I. Mironov. Rényi differential privacy. In Proc. 30th IEEE Computer Security Foundations Symposium (CSF), pages 263–275, 2017.
  39. K. P. Murphy. Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
  40. Trade-off between payoff and model rewards in Shapley-fair collaborative machine learning. In Proc. NeurIPS, 2022.
  41. F. Nielsen and R. Nock. Entropies and cross-entropies of exponential families. In Proc. IEEE ICIP, pages 3621–3624, 2010.
  42. Bias reduction and metric learning for nearest-neighbor estimation of Kullback-Leibler divergence. In Proc. AISTATS, pages 669–677, 2014.
  43. Ocean Protocol Foundation. Ocean Protocol: A decentralized substrate for AI data and services. Technical whitepaper, 2019.
  44. R. K. Pace and R. Barry. Sparse spatial auto-regressions. Statistics and Probability Letters, 33(3):291–297, 1997.
  45. F. Pérez-Cruz. Kullback-Leibler divergence estimation of continuous distributions. In Proc. IEEE ISIT, pages 1666–1670, 2008.
  46. Data valuation in machine learning: “ingredients”, strategies, and open challenges. In Proc. IJCAI, pages 5607–5614, 2022.
  47. Collaborative machine learning with incentive-aware model rewards. In Proc. ICML, pages 8927–8936, 2020.
  48. A. Singh and D. Koutra. Lecture 16 of 10-704: Information processing and learning course. Carnegie Mellon University, 19 March 2015.
  49. S. Singh and B. Póczos. Finite-sample analysis of fixed-k𝑘kitalic_k nearest neighbor density functional estimators. In Proc. NIPS, 2016.
  50. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proc. Annu. Symp. Comput. Appl. Med. Care, pages 261–265, 1988.
  51. Incentivizing collaboration in machine learning via synthetic data rewards. In Proc. AAAI, pages 9448–9456, 2022.
  52. M. Titsias. Variational learning of inducing variables in sparse Gaussian processes. In Proc. AISTATS, pages 567–574, 2009.
  53. J. T. Wang and R. Jia. A note on "Towards efficient data valuation based on the shapley value”. arXiv:2302.11431, 2023.
  54. Divergence estimation for multidimensional densities via k𝑘kitalic_k-nearest-neighbor distances. IEEE Transactions on Information Theory, 55(5):2392–2405, 2009.
  55. A principled approach to data valuation for federated learning. In Q. Yang, L. Fan, and H. Yu, editors, Federated Learning, volume 12500 of Lecture Notes in Computer Science, pages 153–167. Springer, Cham, 2020.
  56. Privacy for free: Posterior sampling and stochastic gradient Monte Carlo. In Proc. ICML, pages 2493–2502, 2015.
  57. Differentially private Shapley values for data evaluation. arXiv:2206.00511, 2022.
  58. Gradient driven rewards to guarantee fairness in collaborative machine learning. In Proc. NeurIPS, pages 16104–16117, 2021.
  59. Validation free and replication robust volume-based data valuation. In Proc. NeurIPS, pages 10837–10848, 2021.
  60. Incentive mechanism for differentially private federated learning in Industrial Internet of Things. IEEE Transactions On Industrial Informatics, 18(10):6927–6939, 2022.
  61. Local differential privacy and its applications: A comprehensive survey. arXiv:2008.03686, 2020.
  62. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):1–19, 2019.
  63. Data valuation using reinforcement learning. In Proc. ICML, pages 10842–10851, 2020.
  64. More than privacy: Adopting differential privacy in game-theoretic mechanism design. ACM Comput. Surv., 54(7), 2022.
  65. Not one but many tradeoffs: Privacy vs. utility in differentially private machine learning. In Proc. ACM SIGSAC Conference on Cloud Computing Security Workshop, pages 15–26, 2020.
  66. P. Zhao and L. Lai. Minimax optimal estimation of KL divergence for continuous distributions. IEEE Transactions on Information Theory, 66(12):7787–7811, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets