Papers
Topics
Authors
Recent
Search
2000 character limit reached

Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Method

Published 21 Mar 2024 in cs.LG and cs.AI | (2403.14110v1)

Abstract: This paper presents a novel reinforcement learning (RL) approach called HAAM-RL (Heuristic Algorithm-based Action Masking Reinforcement Learning) for optimizing the color batching re-sequencing problem in automobile painting processes. The existing heuristic algorithms have limitations in adequately reflecting real-world constraints and accurately predicting logistics performance. Our methodology incorporates several key techniques including a tailored Markov Decision Process (MDP) formulation, reward setting including Potential-Based Reward Shaping, action masking using heuristic algorithms (HAAM-RL), and an ensemble inference method that combines multiple RL models. The RL agent is trained and evaluated using FlexSim, a commercial 3D simulation software, integrated with our RL MLOps platform BakingSoDA. Experimental results across 30 scenarios demonstrate that HAAM-RL with an ensemble inference method achieves a 16.25% performance improvement over the conventional heuristic algorithm, with stable and consistent results. The proposed approach exhibits superior performance and generalization capability, indicating its effectiveness in optimizing complex manufacturing processes. The study also discusses future research directions, including alternative state representations, incorporating model-based RL methods, and integrating additional real-world constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016. [Online]. Available: https://doi.org/10.1038/nature16961
  2. G. Schoettler, A. Nair, J. Luo, S. Bahl, J. A. Ojea, E. Solowjow, and S. Levine, “Deep reinforcement learning for industrial insertion tasks with visual inputs and natural rewards,” 2019.
  3. G. Dulac-Arnold, N. Levine, D. J. Mankowitz, J. Li, C. Paduraru, S. Gowal, and T. Hester, “Challenges of real-world reinforcement learning: definitions, benchmarks and analysis,” Machine Learning, vol. 110, no. 9, pp. 2419–2468, 2021. [Online]. Available: https://doi.org/10.1007/s10994-021-05961-4
  4. A. Irpan, “Deep reinforcement learning doesn’t work yet,” https://www.alexirpan.com/2018/02/14/rl-hard.html, 2018.
  5. A. Kothapalli, M. Shabbir, and X. Koutsoukos, “Learning-based heuristic for combinatorial optimization of the minimum dominating set problem using graph convolutional networks,” 2023.
  6. E. Fadda, G. Perboli, M. Rosano, J. E. Mascolo, and D. Masera, “A decision support system for supporting strategic production allocation in the automotive industry,” Sustainability, vol. 14, no. 4, 2022. [Online]. Available: https://www.mdpi.com/2071-1050/14/4/2408
  7. M. Guo, Q. Zhang, X. Liao, F. Y. Chen, and D. D. Zeng, “A hybrid machine learning framework for analyzing human decision making through learning preferences,” 2019.
  8. Price, “Evolution of cognitive demand in the human–machine interaction integrated with industry 4.0 technologies,” WIT Transactions on The Built Environment, vol. 189, pp. 7–19, 2019, free (open access).
  9. D. F. M. Nardo and T. Murino, “The evolution of man–machine interaction: the role of human in industry 4.0 paradigm,” Production & Manufacturing Research, vol. 8, no. 1, pp. 20–34, 2020. [Online]. Available: https://doi.org/10.1080/21693277.2020.1737592
  10. T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” in Proceedings of the 35th International Conference on Machine Learning, ser. PMLR, vol. 80, Stockholm, Sweden, 2018.
  11. X. Li, W. Luo, M. Yuan, J. Wang, J. Lu, J. Wang, J. Lu, and J. Zeng, “Learning to optimize industry-scale dynamic pickup and delivery problems,” 2021.
  12. D. Meli, A. Castellini, and A. Farinelli, “Learning logic specifications for policy guidance in pomdps: an inductive logic programming approach,” Journal of Artificial Intelligence Research, vol. 79, p. 725–776, Feb. 2024. [Online]. Available: http://dx.doi.org/10.1613/jair.1.15826
  13. J. K. S. Bysko, “Follow-up sequencing algorithm for car sequencing problem 4.0: Progress in automation, robotics and measurement techniques,” in Automation, 2020, pp. 145–154.
  14. S. B. J. Krystek, “Virtual commissioning as the main core of industry 4.0 - case study in the automotive paint shop,” in Intelligent Systems in Production Engineering and Maintenance, 2019, pp. 370–379.
  15. J. Huang, H.-T. Fan, G. Xiao, and Q. Chang, “Paint shop vehicle sequencing based on quantum computing considering color changeover and painting quality,” 2022.
  16. F. Stranieri and F. Stella, “Comparing deep reinforcement learning algorithms in two-echelon supply chains,” 2023.
  17. A. Gupta, A. Pacchiano, Y. Zhai, S. Kakade, and S. Levine, “Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity,” in Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, Eds., vol. 35.   Curran Associates, Inc., 2022, pp. 15 281–15 295. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/6255f22349da5f2126dfc0b007075450-Paper-Conference.pdf
  18. T. P. Le, C. Rho, Y. Min, S. Lee, and D. Choi, “A2gan: A deep reinforcement-based learning algorithm for risk-aware in finance,” IEEE Access, vol. 9, pp. 137 165–137 175, 2021.
  19. B. Badnava, M. Esmaeili, N. Mozayani, and P. Zarkesh-Ha, “A new potential-based reward shaping for reinforcement learning agent,” in 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), 2023, pp. 01–06.
  20. D. Hadfield-Menell, S. Milli, P. Abbeel, S. Russell, and A. Dragon, “Inverse reward design,” in NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017, pp. 6768–6777.
  21. P. Jędrzejowicz, J. Wu, and H. Li, “Deep ensemble reinforcement learning with multiple deep deterministic policy gradient algorithm,” Mathematical Problems in Engineering, vol. 2020, p. 4275623, 2020. [Online]. Available: https://doi.org/10.1155/2020/4275623
  22. H. Sheikh, K. Frisbee, and M. Phielipp, “Dns: Determinantal point process based neural network sampler for ensemble reinforcement learning,” in ICML, 2022.
  23. X. Chen, C. Wang, Z. Zhou, and K. Ross, “Randomized ensembled double q-learning: Learning fast without a model,” in International Conference on Learning Representations (ICLR), 2021. [Online]. Available: https://arxiv.org/abs/2101.05982
  24. “Flexsim korea,” https://www.flexsim.com/co/contacts/flexsimkorea/, accessed: March 15, 2024.
  25. AgileSoDA. (2021) [agilesoda] bakingsoda product information (eng). YouTube. YouTube video. [Online]. Available: https://youtu.be/d4MbFoYADAo
  26. R. A. C. Bianchi, M. F. Martins, C. H. C. Ribeiro, and A. H. R. Costa, “Heuristically-accelerated multiagent reinforcement learning,” IEEE Transactions on Cybernetics, vol. 44, no. 2, pp. 252–265, 2014.
  27. H. Krasowski, J. Thumm, M. Müller, L. Schäfer, X. Wang, and M. Althoff, “Provably safe reinforcement learning: Conceptual analysis, survey, and benchmarking,” 2023.
  28. Q. Cappart, T. Moisan, L.-M. Rousseau, I. Prémont-Schwarz, and A. Cire, “Combining reinforcement learning and constraint programming for combinatorial optimization,” 2020.
  29. C.-A. Cheng, A. Kolobov, and A. Swaminathan, “Heuristic-guided reinforcement learning,” 2021.
  30. H. Lai, J. Shen, W. Zhang, Y. Huang, X. Zhang, R. Tang, Y. Yu, and Z. Li, “On effective scheduling of model-based reinforcement learning,” 2022.
  31. I. Khalid, C. A. Weidner, E. A. Jonckheere, S. G. Shermer, and F. C. Langbein, “Sample-efficient model-based reinforcement learning for quantum control,” 2023.
  32. X. Wu, X. Yan, D. Guan, and M. Wei, “A deep reinforcement learning model for dynamic job-shop scheduling problem with uncertain processing time,” Engineering Applications of Artificial Intelligence, vol. 131, p. 107790, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0952197623019747
  33. S. Alaniz, “Deep reinforcement learning with model learning and monte carlo tree search in minecraft,” 2018.
  34. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” 2017.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.