Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method
Abstract: This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology incorporates several key techniques including a tailored Markov Decision Process (MDP) formulation, reward setting including Potential-Based Reward Shaping, action masking using heuristic algorithms (HAAM-RL), and an ensemble inference method that combines multiple RL models. The RL agent is trained and evaluated using FlexSim, a commercial 3D simulation software, integrated with our RL MLOps platform BakingSoDA. Experimental results across 30 scenarios demonstrate that HAAM-RL with an ensemble inference method achieves a 16.25% performance improvement over the conventional heuristic algorithm, with stable and consistent results. The proposed approach exhibits superior performance and generalization capability, indicating its effectiveness in optimizing complex manufacturing processes. The study also discusses future research directions, including alternative state representations, incorporating model-based RL methods, and integrating additional real-world constraints.
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. [Online]. Available: https://doi.org/10.1038/nature16961
- G. Schoettler, A. Nair, J. Luo, S. Bahl, J. A. Ojea, E. Solowjow, and S. Levine, “Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards,” 2019.
- G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021. [Online]. Available: https://doi.org/10.1007/s10994-021-05961-4
- A. Irpan, “Deep reinforcement learning doesn’t work yet,” https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018.
- A. Kothapalli, M. Shabbir, and X. Koutsoukos, “Learning-based heuristic for combinatorial optimization of the minimum dominating set problem using graph convolutional networks,” 2023.
- E. Fadda, G. Perboli, M. Rosano, J. E. Mascolo, and D. Masera, “A decision support system for supporting strategic production allocation in the automotive industry,” Sustainability, vol. 14, no. 4, 2022. [Online]. Available: https://www.mdpi.com/2071-1050/14/4/2408
- M. Guo, Q. Zhang, X. Liao, F. Y. Chen, and D. D. Zeng, “A hybrid machine learning framework for analyzing human decision making through learning preferences,” 2019.
- Price, “Evolution of cognitive demand in the human–machine interaction integrated with industry 4.0 technologies,” WIT Transactions on The Built Environment, vol. 189, pp. 7–19, 2019, free (open access).
- D. F. M. Nardo and T. Murino, “The evolution of man–machine interaction: the role of human in industry 4.0 paradigm,” Production & Manufacturing Research, vol. 8, no. 1, pp. 20–34, 2020. [Online]. Available: https://doi.org/10.1080/21693277.2020.1737592
- T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. PMLR, vol. 80, Stockholm, Sweden, 2018.
- X. Li, W. Luo, M. Yuan, J. Wang, J. Lu, J. Wang, J. Lu, and J. Zeng, “Learning to optimize industry-scale dynamic pickup and delivery problems,” 2021.
- D. Meli, A. Castellini, and A. Farinelli, “Learning logic specifications for policy guidance in pomdps: an inductive logic programming approach,” Journal of Artificial Intelligence Research, vol. 79, p. 725–776, Feb. 2024. [Online]. Available: http://dx.doi.org/10.1613/jair.1.15826
- J. K. S. Bysko, “Follow-up sequencing algorithm for car sequencing problem 4.0: Progress in automation, robotics and measurement techniques,” in Automation, 2020, pp. 145–154.
- S. B. J. Krystek, “Virtual commissioning as the main core of industry 4.0 - case study in the automotive paint shop,” in Intelligent Systems in Production Engineering and Maintenance, 2019, pp. 370–379.
- J. Huang, H.-T. Fan, G. Xiao, and Q. Chang, “Paint shop vehicle sequencing based on quantum computing considering color changeover and painting quality,” 2022.
- F. Stranieri and F. Stella, “Comparing deep reinforcement learning algorithms in two-echelon supply chains,” 2023.
- A. Gupta, A. Pacchiano, Y. Zhai, S. Kakade, and S. Levine, “Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35. Curran Associates, Inc., 2022, pp. 15 281–15 295. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/6255f22349da5f2126dfc0b007075450-Paper-Conference.pdf
- T. P. Le, C. Rho, Y. Min, S. Lee, and D. Choi, “A2gan: A deep reinforcement-based learning algorithm for risk-aware in finance,” IEEE Access, vol. 9, pp. 137 165–137 175, 2021.
- B. Badnava, M. Esmaeili, N. Mozayani, and P. Zarkesh-Ha, “A new potential-based reward shaping for reinforcement learning agent,” in 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), 2023, pp. 01–06.
- D. Hadfield-Menell, S. Milli, P. Abbeel, S. Russell, and A. Dragon, “Inverse reward design,” in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6768–6777.
- P. Jędrzejowicz, J. Wu, and H. Li, “Deep ensemble reinforcement learning with multiple deep deterministic policy gradient algorithm,” Mathematical Problems in Engineering, vol. 2020, p. 4275623, 2020. [Online]. Available: https://doi.org/10.1155/2020/4275623
- H. Sheikh, K. Frisbee, and M. Phielipp, “Dns: Determinantal point process based neural network sampler for ensemble reinforcement learning,” in ICML, 2022.
- X. Chen, C. Wang, Z. Zhou, and K. Ross, “Randomized ensembled double q-learning: Learning fast without a model,” in International Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://arxiv.org/abs/2101.05982
- “Flexsim korea,” https://www.flexsim.com/co/contacts/flexsimkorea/, accessed: March 15, 2024.
- AgileSoDA. (2021) [agilesoda] bakingsoda product information (eng). YouTube. YouTube video. [Online]. Available: https://youtu.be/d4MbFoYADAo
- R. A. C. Bianchi, M. F. Martins, C. H. C. Ribeiro, and A. H. R. Costa, “Heuristically-accelerated multiagent reinforcement learning,” IEEE Transactions on Cybernetics, vol. 44, no. 2, pp. 252–265, 2014.
- H. Krasowski, J. Thumm, M. Müller, L. Schäfer, X. Wang, and M. Althoff, “Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking,” 2023.
- Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, and A. Cire, “Combining reinforcement learning and constraint programming for combinatorial optimization,” 2020.
- C.-A. Cheng, A. Kolobov, and A. Swaminathan, “Heuristic-guided reinforcement learning,” 2021.
- H. Lai, J. Shen, W. Zhang, Y. Huang, X. Zhang, R. Tang, Y. Yu, and Z. Li, “On effective scheduling of model-based reinforcement learning,” 2022.
- I. Khalid, C. A. Weidner, E. A. Jonckheere, S. G. Shermer, and F. C. Langbein, “Sample-efficient model-based reinforcement learning for quantum control,” 2023.
- X. Wu, X. Yan, D. Guan, and M. Wei, “A deep reinforcement learning model for dynamic job-shop scheduling problem with uncertain processing time,” Engineering Applications of Artificial Intelligence, vol. 131, p. 107790, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0952197623019747
- S. Alaniz, “Deep reinforcement learning with model learning and monte carlo tree search in minecraft,” 2018.
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.