Minimax Least-Square Policy Iteration for Cost-Aware Defense of Traffic Routing against Unknown Threats (2404.05008v1)
Abstract: Dynamic routing is one of the representative control scheme in transportation, production lines, and data transmission. In the modern context of connectivity and autonomy, routing decisions are potentially vulnerable to malicious attacks. In this paper, we consider the dynamic routing problem over parallel traffic links in the face of such threats. An attacker is capable of increasing or destabilizing traffic queues by strategic manipulating the nominally optimal routing decisions. A defender is capable of securing the correct routing decision. Attacking and defensive actions induce technological costs. The defender has no prior information about the attacker's strategy. We develop an least-square policy iteration algorithm for the defender to compute a cost-aware and threat-adaptive defensive strategy. The policy evaluation step computes a weight vector that minimizes the sampled temporal-difference error. We derive a concrete theoretical upper bound on the evaluation error based on the theory of value function approximation. The policy improvement step solves a minimax problem and thus iteratively computes the Markov perfect equilibrium of the security game. We also discuss the training error of the entire policy iteration process.
- L. Jin and S. Amin, “Stability of Fluid Queueing Systems with Parallel Servers and Stochastic Capacities,” IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3948–3955, 2018.
- F. Fraile, T. Tagawa, R. Poler, and A. Ortiz, “Trustworthy Industrial IoT Gateways for Interoperability Platforms and Ecosystems,” IEEE Internet of Things Journal, vol. 5, no. 6, pp. 4506–4514, 2018.
- A. Laszka, W. Abbas, Y. Vorobeychik, and X. Koutsoukos, “Detection and Mitigation of Attacks on Transportation Networks as a Multi-stage Security Game,” Computers & Security, vol. 87, p. 101576, 2019.
- Y. Feng, S. E. Huang, W. Wong, Q. A. Chen, Z. M. Mao, and H. X. Liu, “On the Cybersecurity of Traffic Signal Control System with Connected Vehicles,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 16 267–16 279, 2022.
- M. H. Manshaei, Q. Zhu, T. Alpcan, T. Bacşar, and J.-P. Hubaux, “Game Theory Meets Network Security and Privacy,” ACM Computing Surveys (CSUR), vol. 45, no. 3, pp. 1–39, 2013.
- Q. Xie, J. Wang, and L. Jin, “Cost-aware Defense for Parallel Server Systems against Reliability and Security Failures,” Automatica, vol. 160, p. 111467, 2024.
- H. Xu, L. Tran-Thanh, and N. Jennings, “Playing Repeated Security Games with no Prior Knowledge,” in AAMAS’16: Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems. ACM Press, 2016, pp. 104–112.
- H. Zhang, Y. Mi, Y. Fu, X. Liu, Y. Zhang, J. Wang, and J. Tan, “Security Defense Decision Method Based on Potential Differential Game for Complex Networks,” Computers & Security, vol. 129, p. 103187, 2023.
- L. S. Shapley and R. Snow, “Basic Solutions of Discrete Games,” Contributions to the Theory of Games, vol. 1, no. 24, pp. 27–27, 1950.
- D. Fudenberg and D. K. Levine, “Whither Game Theory? Towards a Theory of Learning in Games,” Journal of Economic Perspectives, vol. 30, no. 4, pp. 151–170, 2016.
- G. W. Brown, “Iterative Solution of Games by Fictitious Play,” Act. Anal. Prod Allocation, vol. 13, no. 1, p. 374, 1951.
- M. L. Littman, “Markov Games as a Framework for Multi-agent Reinforcement Learning,” in Machine learning proceedings 1994. Elsevier, 1994, pp. 157–163.
- M. Bowling and M. Veloso, “Multiagent Learning Using a Variable Learning Rate,” Artificial intelligence, vol. 136, no. 2, pp. 215–250, 2002.
- K. Zhang, S. Kakade, T. Basar, and L. Yang, “Model-based Multi-agent RL in Zero-sum Markov Games with Near-optimal Sample Complexity,” Advances in Neural Information Processing Systems, vol. 33, pp. 1166–1178, 2020.
- J. Pérolat, B. Piot, M. Geist, B. Scherrer, and O. Pietquin, “Softened Approximate Policy Iteration for Markov Games,” in International Conference on Machine Learning. PMLR, 2016, pp. 1860–1868.
- W. Mao and T. Başar, “Provably Efficient Reinforcement Learning in Decentralized General-sum Markov Games,” Dynamic Games and Applications, vol. 13, no. 1, pp. 165–186, 2023.
- M. Sayin, K. Zhang, D. Leslie, T. Basar, and A. Ozdaglar, “Decentralized Q-learning in Zero-sum Markov Games,” Advances in Neural Information Processing Systems, vol. 34, pp. 18 320–18 334, 2021.
- M. G. Lagoudakis and R. Parr, “Least-squares Policy Iteration,” The Journal of Machine Learning Research, vol. 4, pp. 1107–1149, 2003.
- A. Lazaric, M. Ghavamzadeh, and R. Munos, “Finite-sample Analysis of Least-squares Policy Iteration,” Journal of Machine Learning Research, vol. 13, pp. 3041–3074, 2012.
- I. Osband and B. Van Roy, “Why is Posterior Sampling Better than Optimism for Reinforcement Learning?” in International conference on machine learning. PMLR, 2017, pp. 2701–2710.
- L. S. Shapley, “Stochastic Games,” Proceedings of the national academy of sciences, vol. 39, no. 10, pp. 1095–1100, 1953.
- Y. Zhu and D. Zhao, “Online Minimax Q Network Learning for Two-player Zero-sum Markov Games,” IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 3, pp. 1228–1241, 2020.
- A. Lazaric, M. Ghavamzadeh, and R. Munos, “Finite-sample Analysis of LSTD,” in ICML-27th International Conference on Machine Learning, 2010, pp. 615–622.
- A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative Q-learning for Offline Reinforcement Learning,” Advances in Neural Information Processing Systems, vol. 33, pp. 1179–1191, 2020.
- P. Auer, T. Jaksch, and R. Ortner, “Near-optimal Regret Bounds for Reinforcement Learning,” Advances in neural information processing systems, vol. 21, 2008.
- B. Van Roy, “Learning and Value Function Approximation in Complex Decision Processes,” Ph.D. dissertation, Massachusetts Institute of Technology, 1998.
- J. Tsitsiklis and B. Van Roy, “Analysis of Temporal-diffference Learning with Function Approximation,” Advances in neural information processing systems, vol. 9, 1996.
- C. McDiarmid, “Concentration, Probabilistic Methods for Algorithmic Discrete Mathematics, 195–248,” Algorithms Combin, vol. 16, 1998.
- J. Chen and N. Jiang, “Information-theoretic Considerations in Batch Reinforcement Learning,” in International Conference on Machine Learning. PMLR, 2019, pp. 1042–1051.