Linear programming-based solution methods for constrained partially observable Markov decision processes (2206.14081v2)
Abstract: Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. However, they are notoriously difficult to solve to optimality, and there exist only a few approximation methods for obtaining high-quality solutions. In this study, grid-based approximations are used in combination with linear programming (LP) models to generate approximate policies for CPOMDPs. A detailed numerical study is conducted with six CPOMDP problem instances considering both their finite and infinite horizon formulations. The quality of approximation algorithms for solving unconstrained POMDP problems is established through a comparative analysis with exact solution methods. Then, the performance of the LP-based CPOMDP solution approaches for varying budget levels is evaluated. Finally, the flexibility of LP-based approaches is demonstrated by applying deterministic policy constraints, and a detailed investigation into their impact on rewards and CPU run time is provided. For most of the finite horizon problems, deterministic policy constraints are found to have little impact on expected reward, but they introduce a significant increase to CPU run time. For infinite horizon problems, the reverse is observed: deterministic policies tend to yield lower expected total rewards than their stochastic counterparts, but the impact of deterministic constraints on CPU run time is negligible in this case. Overall, these results demonstrate that LP models can effectively generate approximate policies for both finite and infinite horizon problems while providing the flexibility to incorporate various additional constraints into the underlying model.
- Cassandra A (2003) Simple examples. http://www.pomdp.org/examples/, accessed: 2019-09-01
- Cassandra AR (1998) Exact and approximate algorithms for partially observable Markov decision processes. Brown University
- Celen M, Djurdjanovic D (2020) Integrated maintenance and operations decision making with imperfect degradation state observations. Journal of Manufacturing Systems 55:302–316
- Kavaklioglu C, Cevik M (2022) Scalable grid-based approximation algorithms for partially observable markov decision processes. Concurrency and Computation: Practice and Experience 34(5):e6743
- Lovejoy W (1991a) A Survey of Algorithmic Methods for Partially Observed Markov Decision Processes. Annals of Operations Research 28:47–66
- Lovejoy W (1991b) Computationally feasible bounds for partially observed Markov decision processes. Operations Research 39(1):162–175
- Maillart LM (2006) Maintenance policies for systems with condition monitoring and obvious failures. IIE Transactions 38(6):463–475
- McLay LA, Mayorga ME (2013) A dispatching model for server-to-customer systems that balances efficiency and equity. Manufacturing & Service Operations Management 15(2):205–220
- Pajarinen J, Kyrki V (2017) Robotic manipulation of multiple objects as a pomdp. Artificial Intelligence 247:213–228
- Puterman ML (2014) Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons
- Rudin W (1987) Real and complex analysis, 3rd edn. McGraw-Hill
- Sandikci B (2010) Reduction of a pomdp to an mdp. Wiley Encyclopedia of Operations Research and Management Science
- Silver D, Veness J (2010) Monte-carlo planning in large pomdps. Advances in neural information processing systems 23
- Smith T, Simmons R (2012) Heuristic search value iteration for pomdps. arXiv preprint arXiv:12074166
- Sondik EJ (1971) The optimal control of partially observable Markov processes. Stanford University
- Suresh (2005) Sampling from the simplex. Available from http://geomblog.blogspot.com/2005/10/sampling-from-simplex.html Accessed on February 26, 2015.
- Sutton RS, Barto AG (2018) Reinforcement learning: An introduction. MIT press
- Treharne JT, Sox CR (2002) Adaptive inventory control for nonstationary demand and partial information. Management Science 48(5):607–624
- Walraven E, Spaan MT (2018) Column generation algorithms for constrained pomdps. Journal of artificial intelligence research 62:489–533
- Yılmaz ÖF (2020) An integrated bi-objective u-shaped assembly line balancing and parts feeding problem: optimization model and exact solution method. Annals of Mathematics and Artificial Intelligence pp 1–18
- Yılmaz ÖF, et al. (2021) Tactical level strategies for multi-objective disassembly line balancing problem with multi-manned stations: an optimization model and solution approaches. Annals of Operations Research pp 1–51