Papers
Topics
Authors
Recent
2000 character limit reached

Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients (2012.15019v3)

Published 30 Dec 2020 in cs.LG and cs.CR

Abstract: As reinforcement learning techniques are increasingly applied to real-world decision problems, attention has turned to how these algorithms use potentially sensitive information. We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We give examples of how this setting covers real-world problems in privacy for sequential decision-making. We solve this problem in the policy gradients framework by introducing a regularizer based on the mutual information (MI) between the sensitive state and the actions. We develop a model-based stochastic gradient estimator for optimization of privacy-constrained policies. We also discuss an alternative MI regularizer that serves as an upper bound to our main MI regularizer and can be optimized in a model-free setting, and a powerful direct estimator that can be used in an environment with differentiable dynamics. We contrast previous work in differentially-private RL to our mutual-information formulation of information disclosure. Experimental results show that our training method results in policies that hide the sensitive state, even in challenging high-dimensional tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (49)
  1. Risk-Sensitive Reinforcement Learning: A Constrained Optimization Viewpoint. arXiv:1810.09126 [cs, math, stat], October 2018.
  2. The price of differential privacy for online learning. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp.  32–40, Sydney, NSW, Australia, August 2017. JMLR.org.
  3. Altman, E. Constrained Markov Decision Processes. Stochastic Modeling. Chapman & Hall/CRC, Boca Raton ; London, 1999. ISBN 978-0-8493-0382-1.
  4. Differential privacy for multi-armed bandits: What is it and what is its cost? arXiv preprint arXiv:1905.12298, 2019.
  5. JAX: Composable transformations of Python+NumPy progams, 2020.
  6. Elements of Information Theory. Wiley Series in Telecommunications. Wiley, New York, 1991. ISBN 978-0-471-06259-2.
  7. Safe Exploration in Continuous Action Spaces. arXiv:1801.08757 [cs], January 2018.
  8. Is a Good Representation Sufficient for Sample Efficient Reinforcement Learning? arXiv:1910.03016 [cs, math, stat], October 2019.
  9. Privacy against statistical inference. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1401–1408, October 2012. doi: 10.1109/Allerton.2012.6483382.
  10. Calibrating Noise to Sensitivity in Private Data Analysis. In Halevi, S. and Rabin, T. (eds.), Theory of Cryptography, Lecture Notes in Computer Science, pp.  265–284, Berlin, Heidelberg, 2006. Springer. ISBN 978-3-540-32732-5. doi: 10.1007/11681878_14.
  11. Food Fairness: An Artificial Intelligence Perspective for SNAP Allocation. Workshop on AI for Social Good, IJCAI, pp.  5, 2019.
  12. Brax-a differentiable physics engine for large scale rigid body simulation. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  13. InfoBot: Transfer and Exploration via the Information Bottleneck. arXiv preprint arXiv:1901.10902, 2019.
  14. Soft q-learning with mutual-information regularization. In International conference on learning representations, 2018.
  15. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning, pp. 1856–1865, 2018.
  16. Difftaichi: Differentiable programming for physical simulation. In International Conference on Learning Representations, 2019.
  17. Fairness in reinforcement learning. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp.  1617–1626. JMLR. org, 2017.
  18. Model-Based Reinforcement Learning for Atari. arXiv:1903.00374 [cs, stat], March 2019.
  19. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2015.
  20. Hypothesis testing under maximal leakage privacy constraints. In 2017 IEEE International Symposium on Information Theory (ISIT), pp.  779–783, June 2017. doi: 10.1109/ISIT.2017.8006634.
  21. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  22. Mir, D. J. Information-Theoretic foundations of differential privacy. In Proceedings of the 5th International Conference on Foundations and Practice of Security, FPS’12, pp. 374–381, Berlin, Heidelberg, October 2012. Springer-Verlag. ISBN 978-3-642-37118-9. doi: 10.1007/978-3-642-37119-6_25.
  23. Learning Optimal Fair Policies. Proceedings of machine learning research, 97:4674–4682, June 2019. ISSN 2640-3498.
  24. Mitigating bias in adaptive data gathering via differential privacy. In International Conference on Machine Learning, pp. 3720–3729, 2018.
  25. F-gan: Training generative neural samplers using variational divergence minimization. In Advances in Neural Information Processing Systems, pp. 271–279, 2016.
  26. Paninski, L. Estimation of Entropy and Mutual Information. Neural Computation, 15(6):1191–1253, June 2003. ISSN 0899-7667, 1530-888X. doi: 10.1162/089976603321780272.
  27. Constrained reinforcement learning has zero duality gap. Advances in Neural Information Processing Systems, 32, 2019.
  28. Offline reinforcement learning with differential privacy. Advances in Neural Information Processing Systems, 36, 2023.
  29. Searching for activation functions. arXiv preprint arXiv:1710.05941, 2017.
  30. An Optimal Private Stochastic-MAB Algorithm Based on an Optimal Private Stopping Rule. arXiv:1905.09383 [cs, stat], May 2019.
  31. Privacy-preserving reinforcement learning. In Proceedings of the 25th International Conference on Machine Learning - ICML ’08, pp.  864–871, Helsinki, Finland, 2008. ACM Press. ISBN 978-1-60558-205-4. doi: 10.1145/1390156.1390265.
  32. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  33. Differentially private contextual linear bandits. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp.  4301–4311, Red Hook, NY, USA, December 2018. Curran Associates Inc.
  34. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484–489, January 2016. ISSN 0028-0836, 1476-4687. doi: 10.1038/nature16961.
  35. (Nearly) optimal algorithms for private online learning in full-information and bandit settings. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 2733–2741, Red Hook, NY, USA, December 2013. Curran Associates Inc.
  36. Learning Controllable Fair Representations. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 2019.
  37. Responsive safety in reinforcement learning by pid lagrangian methods. In International Conference on Machine Learning, pp. 9133–9143. PMLR, 2020.
  38. Learning to share and hide intentions using information regularization. In Advances in Neural Information Processing Systems, pp. 10249–10259, 2018.
  39. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pp. 20668–20696. PMLR, 2022.
  40. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning Series. The MIT Press, Cambridge, Massachusetts, second edition, 2018. ISBN 978-0-262-03924-6.
  41. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, pp.  5026–5033. IEEE, 2012.
  42. Algorithms for differentially private multi-armed bandits. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp.  2087–2093, Phoenix, Arizona, February 2016. AAAI Press.
  43. Grounding subgoals in information transitions. In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp.  105–111, April 2011. doi: 10.1109/ADPRL.2011.5967384.
  44. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  45. Private reinforcement learning with PAC and regret guarantees. International Conference on Machine Learning, 2020.
  46. Privacy-preserving Q-Learning with Functional Noise in Continuous State Spaces. In Advances in Neural Information Processing Systems, 2019.
  47. On the Relation Between Identifiability, Differential Privacy, and Mutual-Information Privacy. IEEE Transactions on Information Theory, 62(9):5018–5029, September 2016. ISSN 1557-9654. doi: 10.1109/TIT.2016.2584610.
  48. Learning fair representations. In International Conference on Machine Learning, pp. 325–333, 2013.
  49. Privacy preserving learning in negotiation. In Proceedings of the 2005 ACM Symposium on Applied Computing, SAC ’05, pp.  821–825, New York, NY, USA, March 2005. Association for Computing Machinery. ISBN 978-1-58113-964-8. doi: 10.1145/1066677.1066865.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 15 likes.

Upgrade to Pro to view all of the tweets about this paper: