Capacity-Aware Planning and Scheduling in Budget-Constrained Monotonic MDPs: A Meta-RL Approach (2410.21249v1)
Abstract: Many real-world sequential repair problems can be effectively modeled using monotonic Markov Decision Processes (MDPs), where the system state stochastically decreases and can only be increased by performing a restorative action. This work addresses the problem of solving multi-component monotonic MDPs with both budget and capacity constraints. The budget constraint limits the total number of restorative actions and the capacity constraint limits the number of restorative actions that can be performed simultaneously. While prior methods dealt with budget constraints, including capacity constraints in prior methods leads to an exponential increase in computational complexity as the number of components in the MDP grows. We propose a two-step planning approach to address this challenge. First, we partition the components of the multi-component MDP into groups, where the number of groups is determined by the capacity constraint. We achieve this partitioning by solving a Linear Sum Assignment Problem (LSAP). Each group is then allocated a fraction of the total budget proportional to its size. This partitioning effectively decouples the large multi-component MDP into smaller subproblems, which are computationally feasible because the capacity constraint is simplified and the budget constraint can be addressed using existing methods. Subsequently, we use a meta-trained PPO agent to obtain an approximately optimal policy for each group. To validate our approach, we apply it to the problem of scheduling repairs for a large group of industrial robots, constrained by a limited number of repair technicians and a total repair budget. Our results demonstrate that the proposed method outperforms baseline approaches in terms of maximizing the average uptime of the robot swarm, particularly for large swarm sizes.
- Altman, E. 2021. Constrained Markov Decision Processes. Routledge.
- Data analytics for predictive maintenance of industrial robots. In 2017 International Conference on Advanced Systems and Electric Technologies, 412–417.
- Budget Allocation Using Weakly Coupled, Constrained Markov Decision Processes. In UAI.
- Linear assignment problems and extensions. In Handbook of combinatorial optimization: Supplement volume A, 75–149. Springer.
- Budgeted reinforcement learning in continuous state space. In 32nd Advances in Neural Information Processing Systems.
- A deep reinforcement learning approach to dynamic loading strategy of repairable multistate systems. IEEE Transactions on Reliability, 71(1): 484–499.
- Crouse, D. F. 2016. On implementing 2D rectangular assignment algorithms. IEEE Transactions on Aerospace and Electronic Systems, 52(4): 1679–1696.
- Mixed integer linear programming in process scheduling: Modeling, algorithms, and applications. Annals of Operations Research, 139: 131–162.
- Condition and reliability prediction models using the Weibull probability distribution. In Applications of Advanced Technology in Transportation, 19–24.
- Controlling large, graph-based MDPs with global control capacity constraints: An approximate LP solution. In 57th Conference on Decision and Control, 35–42.
- Collaboration of multiple autonomous industrial robots through optimal base placements. Journal of Intelligent & Robotic Systems, 90: 113–132.
- Deep reinforcement learning-based preventive maintenance for repairable machines with deterioration in a flow line system. Annals of Operations Research, 1–21.
- A sample-efficient algorithm for episodic finite-horizon MDP with constraints. In 35th AAAI Conference on Artificial Intelligence, 9, 8030–8037.
- Degradation Modeling of a Robot Arm to Support Prognostics and Health Management. In International Manufacturing Science and Engineering Conference.
- Combinatorial Optimization. Springer.
- State representation learning for control: An overview. Neural Networks, 108: 379–392.
- Human-level control through deep reinforcement learning. Nature, 518(7540): 529–533.
- Combinatorial Optimization: Algorithms and Complexity. Courier Corporation.
- Planning structural inspection and maintenance policies via dynamic programming and Markov processes. Part I: Theory. Reliability Engineering & System Safety, 130: 202–213.
- Setting expediting repair policy in a multi-echelon repairable-item inventory system with limited repair capacity. Journal of the Operational Research Society, 52(2): 198–209.
- Quick health assessment for industrial robot health degradation and the supporting advanced sensing development. Journal of Manufacturing Systems, 48: 51–59.
- Schrijver, A. 1998. Theory of Linear and Integer Programming. John Wiley & Sons.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Constrained combinatorial optimization with reinforcement learning. arXiv preprint arXiv:2006.11984.
- Sutton, R. S. 2018. Reinforcement Learning: An Introduction. A Bradford Book.
- Toth, P. 2000. Optimization engineering techniques for the exact solution of NP-hard combinatorial optimization problems. European Journal of Operational Research, 125(2): 222–238.
- Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning. arXiv preprint arXiv:2408.07192.
- Welfare Maximization Algorithm for Solving Budget-Constrained Multi-Component POMDPs. IEEE Control Systems Letters, 7: 1736–1741.
- Budget constrained bidding by model-free reinforcement learning in display advertising. In 27th ACM International Conference on Information and Knowledge Management, 1443–1451.
- Model-based constrained MDP for budget allocation in sequential incentive marketing. In 28th ACM International Conference on Information and Knowledge Management, 971–980.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.