Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
164 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Approximate Information States for Worst-Case Control and Learning in Uncertain Systems (2301.05089v2)

Published 12 Jan 2023 in eess.SY, cs.AI, cs.SY, and math.OC

Abstract: In this paper, we investigate discrete-time decision-making problems in uncertain systems with partially observed states. We consider a non-stochastic model, where uncontrolled disturbances acting on the system take values in bounded sets with unknown distributions. We present a general framework for decision-making in such problems by using the notion of the information state and approximate information state, and introduce conditions to identify an uncertain variable that can be used to compute an optimal strategy through a dynamic program (DP). Next, we relax these conditions and define approximate information states that can be learned from output data without knowledge of system dynamics. We use approximate information states to formulate a DP that yields a strategy with a bounded performance loss. Finally, we illustrate the application of our results in control and reinforcement learning using numerical examples.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (86)
  1. K.-D. Kim and P. R. Kumar, “Cyber–physical systems: A perspective at the centennial,” Proceedings of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1287–1308, 2012.
  2. A. A. Malikopoulos, L. E. Beaver, and I. V. Chremos, “Optimal time trajectory and coordination for connected and automated vehicles,” Automatica, vol. 125, no. 109469, 2021.
  3. A. Dave, I. V. Chremos, and A. A. Malikopoulos, “Social media and misleading information in a democracy: A mechanism design approach,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2633–2639, 2022.
  4. Englewood Cliffs, NJ: Prentice-Hall, 1986.
  5. A. A. Malikopoulos, “A duality framework for stochastic optimal control of complex systems,” IEEE Transactions on Automatic Control, vol. 61, no. 10, pp. 2756–2765, 2016.
  6. M. Ahmadi, N. Jansen, B. Wu, and U. Topcu, “Control theory meets pomdps: A hybrid systems approach,” IEEE Transactions on Automatic Control, vol. 66, no. 11, pp. 5191–5204, 2020.
  7. A. Mahajan, N. C. Martins, M. C. Rotkowitz, and S. Yüksel, “Information structures in optimal decentralized control,” in 2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 1291–1306, IEEE, 2012.
  8. A. Dave and A. A. Malikopoulos, “Decentralized stochastic control in partially nested information structures,” IFAC-PapersOnLine, vol. 52, no. 20, pp. 97–102, 2019.
  9. A. Dave and A. A. Malikopoulos, “A dynamic program for a team of two agents with nested information,” in 2021 IEEE Conference on Decision and Control (CDC), pp. 3768–3773, IEEE, 2021.
  10. A. Dave, N. Venkatesh, and A. A. Malikopoulos, “Decentralized control of two agents with nested accessible information,” in 2022 American Control Conference (ACC), pp. 3423–3430, IEEE, 2022.
  11. A. A. Malikopoulos, “On team decision problems with nonclassical information structures,” IEEE Transactions on Automatic Control, vol. 68, no. 7, pp. 3915–3930, 2023.
  12. R. K. Mishra, D. Vasal, and S. Vishwanath, “Decentralized multi-agent reinforcement learning with shared actions,” in 2021 55th Annual Conference on Information Sciences and Systems (CISS), pp. 1–6, IEEE, 2021.
  13. K. Zhang, Z. Yang, and T. Başar, “Multi-agent reinforcement learning: A selective overview of theories and algorithms,” Handbook of Reinforcement Learning and Control, pp. 321–384, 2021.
  14. A. A. Malikopoulos, “Separation of learning and control for cyber-physical systems,” Automatica, vol. 151, no. 110912, 2023.
  15. H. Kao and V. Subramanian, “Common information based approximate state representations in multi-agent reinforcement learning,” in International Conference on Artificial Intelligence and Statistics, pp. 6947–6967, PMLR, 2022.
  16. A. A. Malikopoulos, “Combining learning and control in linear systems,” in 22nd European Control Conference (ECC), 2024 (to appear) (arxiv: 2310.14409).
  17. S. Mannor, D. Simester, P. Sun, and J. N. Tsitsiklis, “Bias and variance approximation in value function estimates,” Management Science, vol. 53, no. 2, pp. 308–322, 2007.
  18. L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022.
  19. T. Başar and P. Bernhard, H-infinity optimal control and related minimax design problems: a dynamic game approach. Springer Science & Business Media, 2008.
  20. M. Rasouli, E. Miehling, and D. Teneketzis, “A scalable decomposition method for the dynamic defense of cyber networks,” in Game Theory for Security and Risk Management, pp. 75–98, Springer, 2018.
  21. Y. Shoukry, J. Araujo, P. Tabuada, M. Srivastava, and K. H. Johansson, “Minimax control for cyber-physical systems under network packet scheduling attacks,” in Proceedings of the 2nd ACM international conference on High confidence networked systems, pp. 93–100, 2013.
  22. M. Giuliani, J. Lamontagne, P. Reed, and A. Castelletti, “A state-of-the-art review of optimal reservoir control for managing conflicting demands in a changing world,” Water Resources Research, vol. 57, no. 12, p. e2021WR029927, 2021.
  23. Q. Zhu and T. Başar, “Robust and resilient control design for cyber-physical systems with an application to power systems,” in 2011 50th IEEE Conference on Decision and Control and European Control Conference, pp. 4066–4071, IEEE, 2011.
  24. K. J. Åström and P. R. Kumar, “Control: A perspective.,” Autom., vol. 50, no. 1, pp. 3–43, 2014.
  25. N. Venkatesh, A. Dave, I. Faros, and A. A. Malikopoulos, “Stochastic control with distributionally robust constraints for cyber-physical systems vulnerable to attacks,” in review (arXiv:2311.03666), 2023.
  26. D. Hadfield-Menell, S. J. Russell, P. Abbeel, and A. Dragan, “Cooperative inverse reinforcement learning,” Advances in neural information processing systems, vol. 29, 2016.
  27. I. Faros, A. Dave, and A. A. Malikopoulos, “A q-learning approach for adherence-aware recommendations,” IEEE Control Systems Letters, vol. 7, pp. 3645–3650, 2023.
  28. M. Fatemi, T. W. Killian, J. Subramanian, and M. Ghassemi, “Medical dead-ends and learning to identify high-risk states and treatments,” Advances in Neural Information Processing Systems, vol. 34, pp. 4856–4870, 2021.
  29. B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE transactions on neural networks and learning systems, vol. 29, no. 6, pp. 2042–2062, 2017.
  30. B. Recht, “A tour of reinforcement learning: The view from continuous control,” Annual Review of Control, Robotics, and Autonomous Systems, vol. 2, pp. 253–279, 2019.
  31. D. P. Bertsekas, “Dynamic programming and suboptimal control: A survey from adp to mpc,” European Journal of Control, vol. 11, no. 4-5, pp. 310–334, 2005.
  32. D. P. Bertsekas, Control of uncertain systems with a set-membership description of the uncertainty. PhD thesis, Massachusetts Institute of Technology, 1971.
  33. Springer Science & Business Media, 2012.
  34. J. Moon and T. Başar, “Minimax control over unreliable communication channels,” Automatica, vol. 59, pp. 182–193, 2015.
  35. Athena scientific, 2012.
  36. C. H. Papadimitriou and J. N. Tsitsiklis, “The complexity of markov decision processes,” Mathematics of operations research, vol. 12, no. 3, pp. 441–450, 1987.
  37. P. Bernhard, “A separation theorem for expected value and feared value discrete time control,” ESAIM: Control, Optimisation and Calculus of Variations, vol. 1, pp. 191–206, 1996.
  38. M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  39. A. Dave and A. A. Malikopoulos, “Structural results for decentralized stochastic control with a word-of-mouth communication,” in 2020 American Control Conference (ACC), pp. 2796–2801, IEEE, 2020.
  40. J. Subramanian and A. Mahajan, “Approximate information state for partially observed systems,” in 2019 IEEE 58th Conference on Decision and Control (CDC), pp. 1629–1636, IEEE, 2019.
  41. Y. Cong, X. Wang, and X. Zhou, “Rethinking the mathematical framework and optimality of set-membership filtering,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2544–2551, 2021.
  42. D. Bertsekas and I. Rhodes, “Sufficiently informative functions and the minimax feedback control of uncertain dynamic systems,” IEEE Transactions on Automatic Control, vol. 18, no. 2, pp. 117–124, 1973.
  43. M. Gagrani and A. Nayyar, “Decentralized minimax control problems with partial history sharing,” in 2017 American Control Conference (ACC), pp. 3373–3379, IEEE, 2017.
  44. A. Dave, N. Venkatesh, and A. A. Malikopoulos, “On decentralized minimax control with nested subsystems,” in 2022 American Control Conference (ACC), pp. 3437–3444, IEEE, 2022.
  45. R. R. Moitié, M. Quincampoix, and V. M. Veliov, “Optimal control of discrete-time uncertain systems with imperfect measurement,” IEEE transactions on automatic control, vol. 47, no. 11, pp. 1909–1914, 2002.
  46. C. Piccardi, “Infinite-horizon minimax control with pointwise cost functional,” Journal of Optimization Theory and Applications, vol. 78, no. 2, pp. 317–336, 1993.
  47. P. Bernhard, “Minimax - or feared value - L1subscript𝐿1\text{$L$}_{1}italic_L start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT / L∞subscript𝐿\text{$L$}_{\infty}italic_L start_POSTSUBSCRIPT ∞ end_POSTSUBSCRIPT control,” Theoretical computer science, vol. 293, no. 1, pp. 25–44, 2003.
  48. M. Gagrani, Y. Ouyang, M. Rasouli, and A. Nayyar, “Worst-case guarantees for remote estimation of an uncertain source,” IEEE Transactions on Automatic Control, vol. 66, no. 4, pp. 1794–1801, 2020.
  49. M. R. James, J. S. Baras, and R. J. Elliott, “Risk-sensitive control and dynamic games for partially observed discrete-time nonlinear systems,” IEEE transactions on automatic control, vol. 39, no. 4, pp. 780–792, 1994.
  50. P. Bernhard, “Max-plus algebra and mathematical fear in dynamic optimization,” Set-Valued Analysis, vol. 8, no. 1, pp. 71–84, 2000.
  51. S. P. Coraluppi and S. I. Marcus, “Risk-sensitive and minimax control of discrete-time, finite-state markov decision processes,” Automatica, vol. 35, no. 2, pp. 301–309, 1999.
  52. A. Dave, I. Faros, N. Venkatesh, and A. A. Malikopoulos, “Worst-case control and learning using partial observations over an infinite time horizon,” in Proceedings of the 62nd IEEE Conference on Decision and Control (CDC), pp. 6014–6019, IEEE, 2023.
  53. A. Dave, N. Venkatesh, and A. A. Malikopoulos, “Approximate information states for worst-case control of uncertain systems,” in Proceedings of the 61th IEEE Conference on Decision and Control (CDC), pp. 4945–4950, 2022.
  54. N. Bäauerle and U. Rieder, “Partially observable risk-sensitive markov decision processes,” Mathematics of Operations Research, vol. 42, no. 4, pp. 1180–1196, 2017.
  55. T. Osogami, “Robust partially observable markov decision process,” in International Conference on Machine Learning, pp. 106–115, PMLR, 2015.
  56. A. Girard, “Controller synthesis for safety and reachability via approximate bisimulation,” Automatica, vol. 48, no. 5, pp. 947–953, 2012.
  57. Y. Tazaki and J.-i. Imura, “Finite abstractions of discrete-time linear systems and its application to optimal control,” IFAC Proceedings Volumes, vol. 41, no. 2, pp. 10201–10206, 2008.
  58. F. de Roo and M. Mazo, “On symbolic optimal control via approximate simulation relations,” in 52nd IEEE Conference on Decision and Control, pp. 3205–3210, IEEE, 2013.
  59. G. Reissig and M. Rungger, “Symbolic optimal control,” IEEE Transactions on Automatic Control, vol. 64, no. 6, pp. 2224–2239, 2018.
  60. E. Gallestey, M. James, and W. McEneaney, “Max-plus methods in partially observed h/sub/spl infin//control,” in Proceedings of the 38th IEEE Conference on Decision and Control (Cat. No. 99CH36304), vol. 3, pp. 3011–3016, IEEE, 1999.
  61. N. Saldi, T. Linder, and S. Yüksel, “Asymptotic optimality and rates of convergence of quantized stationary policies in stochastic control,” IEEE Transactions on Automatic Control, vol. 60, no. 2, pp. 553–558, 2014.
  62. S. Meyn, Control Systems and Reinforcement Learning. Cambridge University Press, 2022.
  63. S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,” arXiv preprint arXiv:2005.01643, 2020.
  64. A. A. Malikopoulos, P. Y. Papalambros, and D. N. Assanis, “A Real-Time Computational Learning Model for Sequential Decision-Making Problems Under Uncertainty,” Journal of Dynamic Systems, Measurement, and Control, vol. 131, 05 2009. 041010.
  65. T. M. Moerland, J. Broekens, and C. M. Jonker, “Model-based reinforcement learning: A survey,” arXiv preprint arXiv:2006.16712, 2020.
  66. J. Ramírez, W. Yu, and A. Perrusquía, “Model-free reinforcement learning from expert demonstrations: a survey,” Artificial Intelligence Review, vol. 55, no. 4, pp. 3213–3241, 2022.
  67. J. Morimoto and K. Doya, “Robust reinforcement learning,” Neural computation, vol. 17, no. 2, pp. 335–359, 2005.
  68. M. Heger, “Consideration of risk in reinforcement learning,” in Machine Learning Proceedings 1994, pp. 105–111, Elsevier, 1994.
  69. G. Jiang, C.-P. Wu, and G. Cybenko, “Minimax-based reinforcement learning with state aggregation,” in Proceedings of the 37th IEEE Conference on Decision and Control (Cat. No. 98CH36171), vol. 2, pp. 1236–1241, IEEE, 1998.
  70. J. Garcıa and F. Fernández, “A comprehensive survey on safe reinforcement learning,” Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, 2015.
  71. A. P. Valadbeigi, A. K. Sedigh, and F. L. Lewis, “H∞{}_{\infty}start_FLOATSUBSCRIPT ∞ end_FLOATSUBSCRIPT static output-feedback control design for discrete-time systems using reinforcement learning,” IEEE transactions on neural networks and learning systems, vol. 31, no. 2, pp. 396–406, 2019.
  72. S. Chakravorty and D. Hyland, “Minimax reinforcement learning,” in AIAA Guidance, Navigation, and Control Conference and Exhibit, p. 5718, 2003.
  73. B. Kiumarsi, F. L. Lewis, and Z.-P. Jiang, “H∞{}_{\infty}start_FLOATSUBSCRIPT ∞ end_FLOATSUBSCRIPT control of linear discrete-time systems: Off-policy reinforcement learning,” Automatica, vol. 78, pp. 144–152, 2017.
  74. N. Agarwal, B. Bullins, E. Hazan, S. Kakade, and K. Singh, “Online control with adversarial disturbances,” in International Conference on Machine Learning, pp. 111–119, PMLR, 2019.
  75. P. Gradu, J. Hallman, and E. Hazan, “Non-stochastic control with bandit feedback,” Advances in Neural Information Processing Systems, vol. 33, pp. 10764–10774, 2020.
  76. J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, “Approximate information state for approximate planning and reinforcement learning in partially observed systems,” Journal of Machine Learning Research, vol. 23, no. 12, pp. 1–83, 2022.
  77. L. Yang, K. Zhang, A. Amice, Y. Li, and R. Tedrake, “Discrete approximate information states in partially observable environments,” in 2022 American Control Conference (ACC), pp. 1406–1413, IEEE, 2022.
  78. T. W. Killian, H. Zhang, J. Subramanian, M. Fatemi, and M. Ghassemi, “An empirical study of representation learning for reinforcement learning in healthcare,” in Machine Learning for Health, pp. 139–160, PMLR, 2020.
  79. G. N. Nair, “A nonstochastic information theory for communication and state estimation,” IEEE Transactions on automatic control, vol. 58, no. 6, pp. 1497–1510, 2013.
  80. A. Girard and G. J. Pappas, “Approximation metrics for discrete and continuous systems,” IEEE Transactions on Automatic Control, vol. 52, no. 5, pp. 782–798, 2007.
  81. M. F. Barnsley, Superfractals. Cambridge University Press, 2006.
  82. G. Didinsky, Design of minimax controllers for nonlinear systems using cost-to-come methods. University of Illinois at Urbana-Champaign, 1995.
  83. D. Bertsekas, “Convergence of discretization procedures in dynamic programming,” IEEE Transactions on Automatic Control, vol. 20, no. 3, pp. 415–419, 1975.
  84. D. Karimi and S. E. Salcudean, “Reducing the hausdorff distance in medical image segmentation with convolutional neural networks,” IEEE Transactions on medical imaging, vol. 39, no. 2, pp. 499–513, 2019.
  85. O. U. Aydin, A. A. Taha, A. Hilbert, A. A. Khalil, I. Galinovic, J. B. Fiebach, D. Frey, and V. I. Madai, “On the usage of average hausdorff distance for segmentation performance assessment: hidden error when used for ranking,” European radiology experimental, vol. 5, pp. 1–7, 2021.
  86. A. Dave, N. Venkatesh, and A. A. Malikopoulos, “On robust control of partially observed uncertain systems with additive costs,” in Proceedings of the 2023 American Control Conference (ACC), pp. 4639–4644, 2023.
Citations (5)

Summary

We haven't generated a summary for this paper yet.