Structured Reinforcement Learning for Incentivized Stochastic Covert Optimization (2405.07415v1)
Abstract: This paper studies how a stochastic gradient algorithm (SG) can be controlled to hide the estimate of the local stationary point from an eavesdropper. Such problems are of significant interest in distributed optimization settings like federated learning and inventory management. A learner queries a stochastic oracle and incentivizes the oracle to obtain noisy gradient measurements and perform SG. The oracle probabilistically returns either a noisy gradient of the function} or a non-informative measurement, depending on the oracle state and incentive. The learner's query and incentive are visible to an eavesdropper who wishes to estimate the stationary point. This paper formulates the problem of the learner performing covert optimization by dynamically incentivizing the stochastic oracle and obfuscating the eavesdropper as a finite-horizon Markov decision process (MDP). Using conditions for interval-dominance on the cost and transition probability structure, we show that the optimal policy for the MDP has a monotone threshold structure. We propose searching for the optimal stationary policy with the threshold structure using a stochastic approximation algorithm and a multi-armed bandit approach. The effectiveness of our methods is numerically demonstrated on a covert federated learning hate-speech classification task.
- Distributed Product Flow Control in a Network of Inventories With Stochastic Production and Demand. IEEE Access, 7:22486–22494, 2019.
- Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Now Publishers, 2012. Google-Books-ID: Rl2skwEACAAJ.
- S. Ghadimi and G. Lan. Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming. SIAM Journal on Optimization, 23(4):2341–2368, Jan. 2013.
- A. Jain and V. Krishnamurthy. Controlling Federated Learning for Covertness. Transactions on Machine Learning Research, 2024.
- V. Krishnamurthy. Interval dominance based structural results for Markov decision process. Automatica, 153:111024, July 2023.
- H. Kushner and G. G. Yin. Stochastic Approximation and Recursive Algorithms and Applications. Springer Science & Business Media, July 2003. Google-Books-ID: EC2w1SaPb7YC.
- M. H. Ngo and V. Krishnamurthy. Monotonicity of Constrained Optimal Transmission Policies in Correlated Fading Channels With ARQ. IEEE Transactions on Signal Processing, 58(1):438–451, Jan. 2010.
- J. K.-H. Quah and B. Strulovici. Comparative Statics, Informativeness, and the Interval Dominance Order. Econometrica, 77(6):1949–1992, 2009. Publisher: Wiley, The Econometric Society.
- Finite-Time Convergent Algorithms for Time-Varying Distributed Optimization. IEEE Control Systems Letters, 7:3223–3228, 2023.
- Private Sequential Learning. Operations Research, 69(5):1575–1590, Sept. 2021.
- Decentral and Incentivized Federated Learning Frameworks: A Systematic Literature Review. IEEE Internet of Things Journal, 10(4):3642–3663, Feb. 2023.
- Learner-Private Convex Optimization. IEEE Transactions on Information Theory, 69(1):528–547, Jan. 2023.