Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Fixed-Budget Differentially Private Best Arm Identification (2401.09073v1)

Published 17 Jan 2024 in cs.LG, cs.AI, cs.IT, math.IT, math.ST, stat.ML, and stat.TH

Abstract: We study best arm identification (BAI) in linear bandits in the fixed-budget regime under differential privacy constraints, when the arm rewards are supported on the unit interval. Given a finite budget $T$ and a privacy parameter $\varepsilon>0$, the goal is to minimise the error probability in finding the arm with the largest mean after $T$ sampling rounds, subject to the constraint that the policy of the decision maker satisfies a certain {\em $\varepsilon$-differential privacy} ($\varepsilon$-DP) constraint. We construct a policy satisfying the $\varepsilon$-DP constraint (called {\sc DP-BAI}) by proposing the principle of {\em maximum absolute determinants}, and derive an upper bound on its error probability. Furthermore, we derive a minimax lower bound on the error probability, and demonstrate that the lower and the upper bounds decay exponentially in $T$, with exponents in the two bounds matching order-wise in (a) the sub-optimality gaps of the arms, (b) $\varepsilon$, and (c) the problem complexity that is expressible as the sum of two terms, one characterising the complexity of standard fixed-budget BAI (without privacy constraints), and the other accounting for the $\varepsilon$-DP constraint. Additionally, we present some auxiliary results that contribute to the derivation of the lower bound on the error probability. These results, we posit, may be of independent interest and could prove instrumental in proving lower bounds on error probabilities in several other bandit problems. Whereas prior works provide results for BAI in the fixed-budget regime without privacy constraints or in the fixed-confidence regime with privacy constraints, our work fills the gap in the literature by providing the results for BAI in the fixed-budget regime under the $\varepsilon$-DP constraint.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Best arm identification in multi-armed bandits. In COLT, pages 41–53.
  2. When privacy meets partial information: A refined analysis of differentially private bandits. arXiv preprint arXiv:2209.02570.
  3. Interactive and concentrated differential privacy for bandits. In Sixteenth European Workshop on Reinforcement Learning.
  4. Fixed-budget best-arm identification in structured bandits. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pages 2798–2804.
  5. Privacy in multi-armed bandits: Fundamental definitions and lower bounds. arXiv preprint arXiv:1905.12298.
  6. Tight (lower) bounds for the fixed budget best arm identification bandit problem. In Conference on Learning Theory, pages 590–604. PMLR.
  7. Private and continual release of statistics. ACM Transactions on Information and System Security (TISSEC), 14(3):1–24.
  8. Federated best arm identification with heterogeneous clients. IEEE Transactions on Information Theory, pages 1–1. doi: 10.1109/TIT.2023.3338027.
  9. Distributed differential privacy in multi-armed bandits. arXiv preprint arXiv:2206.05772.
  10. Shuffle private linear contextual bandits. arXiv preprint arXiv:2202.05567.
  11. Dwork, C. (2006). Differential privacy. In Automata, Languages and Programming: 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, pages 1–12. Springer.
  12. The algorithmic foundations of differential privacy. Foundations and Trends® in Theoretical Computer Science, 9(3–4):211–407.
  13. Optimal best arm identification with fixed confidence. In Conference on Learning Theory, pages 998–1027. PMLR.
  14. (nearly) optimal algorithms for private online learning in full-information and bandit settings. Advances in Neural Information Processing Systems, 26:2733–2741.
  15. Differentially private stochastic linear bandits: (almost) for free. arXiv preprint arXiv:2207.03445.
  16. Differentially private online learning. In Conference on Learning Theory, pages 24–1. JMLR Workshop and Conference Proceedings.
  17. Almost optimal exploration in multi-armed bandits. In International Conference on Machine Learning, pages 1238–1246. PMLR.
  18. Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908.
  19. Asymptotically minimax optimal fixed-budget best arm identification for expected simple regret minimization. arXiv preprint arXiv:2302.02988.
  20. The equivalence of two extremum problems. Canadian Journal of Mathematics, 12:363–366.
  21. Minimax optimal algorithms for fixed-budget best arm identification. Advances in Neural Information Processing Systems, 35:10393–10404.
  22. Bandit algorithms. Cambridge University Press.
  23. Meyer, C. D. (2000). Matrix analysis and applied linear algebra, volume 71. SIAM.
  24. (nearly) optimal differentially private stochastic multi-arm bandits. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, pages 592–601.
  25. Quantile multi-armed bandits: Optimal best-arm identification and a differentially private scheme. IEEE Journal on Selected Areas in Information Theory, 2(2):534–548.
  26. Multi-agent best arm identification with private communications. International Conference on Machine Learning.
  27. An optimal private stochastic-mab algorithm based on optimal private stopping rule. In International Conference on Machine Learning, pages 5579–5588. PMLR.
  28. Differentially private contextual linear bandits. In Advances in Neural Information Processing Systems, volume 31, page 4301–4311.
  29. Sheffet, O. (2015). Private approximations of the 2nd-moment matrix using existing techniques in linear regression. arXiv preprint arXiv:1507.00056.
  30. Best-arm identification in linear bandits. Advances in Neural Information Processing Systems, 27:828–836.
  31. Differentially private federated combinatorial bandits with constraints. In Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2022, Grenoble, France, September 19–23, 2022, Proceedings, Part IV, pages 620–637. Springer.
  32. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285–294.
  33. Algorithms for differentially private multi-armed bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 30, page 2087–2093.
  34. Weisberg, S. (2005). Applied linear regression, volume 528. John Wiley & Sons.
  35. Minimax optimal fixed-budget best arm identification in linear bandits. Advances in Neural Information Processing Systems, 35:12253–12266.
  36. Locally differentially private (contextual) bandits learning. Advances in Neural Information Processing Systems, 33:12300–12310.
  37. Neural contextual bandits with UCB-based exploration. In Proceedings of the International Conference on Machine Learning, ICML-20, pages 11492–11502.
  38. On differentially private federated linear contextual bandits. arXiv preprint arXiv:2302.13945.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com