Papers
Topics
Authors
Recent
Search
2000 character limit reached

Differentially Private Reinforcement Learning with Self-Play

Published 11 Apr 2024 in cs.LG, cs.AI, cs.CR, cs.MA, and stat.ML | (2404.07559v1)

Abstract: We study the problem of multi-agent reinforcement learning (multi-agent RL) with differential privacy (DP) constraints. This is well-motivated by various real-world applications involving sensitive data, where it is critical to protect users' private information. We first extend the definitions of Joint DP (JDP) and Local DP (LDP) to two-player zero-sum episodic Markov Games, where both definitions ensure trajectory-wise privacy protection. Then we design a provably efficient algorithm based on optimistic Nash value iteration and privatization of Bernstein-type bonuses. The algorithm is able to satisfy JDP and LDP requirements when instantiated with appropriate privacy mechanisms. Furthermore, for both notions of DP, our regret bound generalizes the best known result under the single-agent RL case, while our regret could also reduce to the best known result for multi-agent RL without privacy constraints. To the best of our knowledge, these are the first line of results towards understanding trajectory-wise privacy protection in multi-agent RL.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (64)
  1. Model-based reinforcement learning with value-targeted regression. In International Conference on Machine Learning, pages 463–474. PMLR, 2020.
  2. Minimax regret bounds for reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 263–272. JMLR. org, 2017.
  3. Negotiating team formation using deep reinforcement learning. Artificial Intelligence, 288:103356, 2020.
  4. Yu Bai and Chi Jin. Provable self-play algorithms for competitive reinforcement learning. In International Conference on Machine Learning, pages 551–560. PMLR, 2020.
  5. Near-optimal reinforcement learning with self-play. Advances in neural information processing systems, 33:2159–2170, 2020.
  6. Differentially private policy evaluation. In International Conference on Machine Learning, pages 2130–2138. PMLR, 2016.
  7. When is memorization of irrelevant training data necessary for high-accuracy learning? In ACM SIGACT Symposium on Theory of Computing, pages 123–132, 2021.
  8. Superhuman ai for multiplayer poker. Science, 365(6456):885–890, 2019.
  9. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.
  10. The secret sharer: Evaluating and testing unintended memorization in neural networks. In USENIX Security Symposium (USENIX Security 19), pages 267–284, 2019.
  11. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):1–24, 2011.
  12. Differentially private regret minimization in episodic markov decision processes. In Proceedings of the AAAI Conference on Artificial Intelligence, 2022.
  13. Adaptive control of differentially private linear quadratic systems. In 2021 IEEE International Symposium on Information Theory (ISIT), pages 485–490. IEEE, 2021.
  14. Differentially private reward estimation with preference feedback. arXiv preprint arXiv:2310.19733, 2023.
  15. Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation. In The Thirty Sixth Annual Conference on Learning Theory, pages 2651–2652. PMLR, 2023.
  16. Privacy-constrained policies via mutual information regularized policy gradients. arXiv preprint arXiv:2012.15019, 2020.
  17. Unifying pac and regret: Uniform pac bounds for episodic reinforcement learning. In Advances in Neural Information Processing Systems, pages 5713–5723, 2017.
  18. Local privacy and statistical minimax rates. In 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, pages 429–438. IEEE, 2013.
  19. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.
  20. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci., 9(3-4):211–407, 2014.
  21. Competitive Markov decision processes. Springer Science & Business Media, 2012.
  22. Local differential privacy for regret minimization in reinforcement learning. Advances in Neural Information Processing Systems, 34, 2021.
  23. Privacy-engineered value decomposition networks for cooperative multi-agent reinforcement learning. In 2023 62nd IEEE Conference on Decision and Control (CDC), pages 8038–8044. IEEE, 2023.
  24. Hiding in plain sight: Differential privacy noise exploitation for evasion-resilient localized poisoning attacks in multiagent reinforcement learning. In 2023 International Conference on Machine Learning and Cybernetics (ICMLC), pages 209–216. IEEE, 2023.
  25. Brnes: Enabling security and privacy-aware experience sharing in multiagent robotic and autonomous systems. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 9269–9276. IEEE, 2023.
  26. Private matchings and allocations. In Proceedings of the forty-sixth annual ACM symposium on Theory of computing, pages 21–30, 2014.
  27. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11(4), 2010.
  28. Is q-learning provably efficient? In Advances in Neural Information Processing Systems, pages 4863–4873, 2018.
  29. Provably efficient reinforcement learning with linear function approximation. In Conference on Learning Theory, pages 2137–2143. PMLR, 2020.
  30. V-learning–a simple, efficient, decentralized algorithm for multiagent rl. arXiv preprint arXiv:2110.14555, 2021.
  31. Practical and private (deep) learning without sampling or shuffling. In International Conference on Machine Learning, pages 5213–5225. PMLR, 2021.
  32. Mechanism design in large games: Incentives and privacy. In Proceedings of the 5th conference on Innovations in theoretical computer science, pages 403–410, 2014.
  33. Actor critic with differentially private critic. arXiv preprint arXiv:1910.05876, 2019.
  34. Locally differentially private reinforcement learning for linear mixture markov decision processes. In Asian Conference on Machine Learning, pages 627–642. PMLR, 2023.
  35. A sharp analysis of model-based reinforcement learning with self-play. In International Conference on Machine Learning, pages 7001–7010. PMLR, 2021.
  36. Differentially private exploration in reinforcement learning with linear representation. arXiv preprint arXiv:2112.01585, 2021.
  37. On improving model-free algorithms for decentralized multi-agent reinforcement learning. In International Conference on Machine Learning, pages 15007–15049. PMLR, 2022.
  38. Polynomial-time algorithms for linear programming. Integer and Combinatorial Optimization, pages 146–181, 1988.
  39. Improved regret for differentially private exploration in linear mdp. In International Conference on Machine Learning, pages 16529–16552. PMLR, 2022.
  40. Locally private distributed reinforcement learning. arXiv preprint arXiv:2001.11718, 2020.
  41. Near-optimal deployment efficiency in reward-free reinforcement learning with linear function approximation. International Conference on Learning Representations, 2023a.
  42. Offline reinforcement learning with differential privacy. Advances in Neural Information Processing Systems, 2023b.
  43. Near-optimal differentially private reinforcement learning. In International Conference on Artificial Intelligence and Statistics, pages 9914–9940. PMLR, 2023c.
  44. Near-optimal reinforcement learning with self-play under adaptivity constraints. arXiv preprint arXiv:2402.01111, 2024.
  45. Sample-efficient reinforcement learning with loglog(T) switching cost. In International Conference on Machine Learning, pages 18031–18061. PMLR, 2022.
  46. Logarithmic switching cost in reinforcement learning beyond linear mdps. arXiv preprint arXiv:2302.12456, 2023.
  47. Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295, 2016.
  48. Lloyd S Shapley. Stochastic games. Proceedings of the national academy of sciences, 39(10):1095–1100, 1953.
  49. Roshan Shariff and Or Sheffet. Differentially private contextual linear bandits. Advances in Neural Information Processing Systems, 31, 2018.
  50. A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Systems with Applications, 208:118124, 2022.
  51. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  52. Privacy preserving large language models: Chatgpt case study based vision and framework. arXiv preprint arXiv:2310.12523, 2023.
  53. Private reinforcement learning with pac and regret guarantees. In International Conference on Machine Learning, pages 9754–9764. PMLR, 2020.
  54. Privacy-preserving q-learning with functional noise in continuous spaces. Advances in Neural Information Processing Systems, 32, 2019.
  55. Breaking the curse of multiagency: Provably efficient decentralized multi-agent rl with function approximation. arXiv preprint arXiv:2302.06606, 2023.
  56. Privately aligning language models with reinforcement learning. arXiv preprint arXiv:2310.16960, 2023a.
  57. Differentially private episodic reinforcement learning with heavy-tailed rewards. arXiv preprint arXiv:2306.01121, 2023b.
  58. Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium. In Conference on learning theory, pages 3674–3682. PMLR, 2020.
  59. Privacy preserving off-policy evaluation. arXiv preprint arXiv:1902.00174, 2019.
  60. Towards playing full moba games with deep reinforcement learning. Advances in Neural Information Processing Systems, 33:621–632, 2020.
  61. Differentially private temporal difference learning with stochastic nonconvex-strongly-concave optimization. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 985–993, 2023a.
  62. Dpmac: differentially private communication for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2308.09902, 2023b.
  63. Differentially private linear sketches: Efficient implementations and applications. Advances in Neural Information Processing Systems, 35:12691–12704, 2022.
  64. Xingyu Zhou. Differentially private reinforcement learning with linear function approximation. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 6(1):1–27, 2022.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 12 likes about this paper.