Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Controlled Learning for Inventory Control (2011.15122v6)

Published 30 Nov 2020 in cs.LG

Abstract: Problem Definition: Are traditional deep reinforcement learning (DRL) algorithms, developed for a broad range of purposes including game-play and robotics, the most suitable machine learning algorithms for applications in inventory control? To what extent would DRL algorithms tailored to the unique characteristics of inventory control problems provide superior performance compared to DRL and traditional benchmarks? Methodology/results: We propose and study Deep Controlled Learning (DCL), a new DRL framework based on approximate policy iteration specifically designed to tackle inventory problems. Comparative evaluations reveal that DCL outperforms existing state-of-the-art heuristics in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times, achieving lower average costs across all test instances and maintaining an optimality gap of no more than 0.1\%. Notably, the same hyperparameter set is utilized across all experiments, underscoring the robustness and generalizability of the proposed method. Managerial implications: These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems, empowering decision-makers to optimize stock levels, minimize costs, and enhance responsiveness across various industries.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Fitting discrete distributions on the first two moments. Probability in the Engineering and Informational Sciences, 9(4):623–632.
  2. Closed-form approximations for optimal (r, q) and (s, t) policies in a parallel processing environment. Operations Research, 65(5):1414–1428.
  3. Lost-sales inventory theory: A review. European Journal of Operational Research, 215(1):1–13.
  4. Deep reinforcement learning for inventory control: A roadmap. European Journal of Operational Research, 298(2):401–412.
  5. Improved base-stock approximations for independent stochastic lead times with order crossover. Manufacturing & Service Operations Management, 7(4):319–329.
  6. A heuristic to manage perishable inventory with batch ordering, positive lead-times, and time-varying demand. Computers & Operations Research, 36(11):3013–3018.
  7. Asymptotic optimality of base-stock policies for perishable inventory systems. Management Science, 69(2):846–864.
  8. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
  9. Interesting, important, and impactful operations management. Manufacturing & Service Operations Management, 22(1):214–222.
  10. Approximation algorithms for capacitated perishable inventory systems with positive lead times. Management Science, 64(11):5038–5061.
  11. Fixed-dimensional stochastic dynamic programs: An approximation scheme and an inventory application. Operations Research, 62(1):81–103.
  12. Optimal policies for a multi-echelon inventory problem. Management Science, 6(4):475–490.
  13. Lifo inventory systems. Management Science, 24(11):1150–1162.
  14. Policy improvement by planning with gumbel. In International Conference on Learning Representations.
  15. Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management. European Journal of Operational Research, 301(2):535–545.
  16. Discovering and removing exogenous state variables and rewards for reinforcement learning. In International Conference on Machine Learning, pages 1262–1270. PMLR.
  17. Inventory management for stochastic lead times with order crossovers. European Journal of Operational Research, 248(2):473–486.
  18. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(18):503–556.
  19. Sequential halving using scores. In Browne, C., Kishimoto, A., and Schaeffer, J., editors, Advances in Computer Games, pages 41–52, Cham. Springer International Publishing.
  20. Approximations of dynamic, multilocation production and inventory problems. Management Science, 30(1):69–84.
  21. Multi-echelon inventory optimization using deep reinforcement learning. Available at SSRN: https://ssrn.com/abstract=4227665 or http://dx.doi.org/10.2139/ssrn.4227665.
  22. Can deep reinforcement learning improve inventory management? performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing & Service Operations Management, 24(3):1349–1368.
  23. Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations Research, 41(3):898–913.
  24. Deep Learning. MIT Press. http://www.deeplearningbook.org.
  25. Improved ordering of perishables: The value of stock-age information. International Journal of Production Economics, 209:316–324. The Proceedings of the 19th International Symposium on Inventories.
  26. Review on ranking and selection: A new perspective. Frontiers of Engineering Management, 8:321–343.
  27. Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Science, 55(3):404–420.
  28. Pure and restricted base-stock policies for the lost-sales inventory system with periodic review and constant lead times. 15th International Symposium on Inventories ; Conference date: 22-08-2008 Through 26-08-2008.
  29. Managing perishable and aging inventories: Review and future research directions. In Kempf, K., Keskinocak, P., and Uzsoy, R., editors, Planning Production and Inventories in the Extended Enterprise, volume 151 of International Series in Operations Research & Management Science, pages 393–436. Springer, New York.
  30. Almost optimal exploration in multi-armed bandits. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1238–1246, Atlanta, Georgia, USA. PMLR.
  31. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  32. Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 424–431.
  33. Simulation Modeling and Analysis. McGraw-Hill, Boston, MA.
  34. Analysis of classification-based policy iteration algorithms. Journal of Machine Learning Research, 17(19):1–30.
  35. Variance reduction for reinforcement learning in input-driven environments. In International Conference on Learning Representations.
  36. Periodic review inventory-control for perishable products under service-level constraints. OR Spectrum, 32:979–996.
  37. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937.
  38. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  39. Morton, K. (1969). Bounds on the solution of the lagged optimal inventory equation with no demand backlogging and proportional costs. SIAM Review, 11(4):572–596.
  40. Morton, K. (1971). The near-myopic nature of the lagged-proportional-cost inventory problem with lost sales. Operations Research, 19(7):1708–1716.
  41. Inventory management with stochastic lead times. Mathematics of Operations Research, 40(2):302–327.
  42. Nahmias, S. (1975a). A comparison of alternative approximations for ordering perishable inventory. Information Systems and Operational Research, 13(2):175–184.
  43. Nahmias, S. (1975b). Optimal ordering policies for perishable inventory—ii. Operations Research, 23(4):735–749.
  44. Nahmias, S. (2011). Perishable Inventory Systems. International Series in Operations Research & Management Science. Springer, New York.
  45. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manufacturing & Service Operations Management, 24(1):285–304.
  46. Pytorch: An imperative style, high-performance deep learning library.
  47. Powell, W. (2020). Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions. Wiley-Interscience.
  48. Powell, W. B. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Hoboken, NJ.
  49. Powell, W. B. (2019). A unified framework for stochastic optimization. European Journal of Operational Research, 275(3):795–821.
  50. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  51. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  52. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144.
  53. Inventory Management and Production Planning and Scheduling. John Wiley & Sons.
  54. Exploiting random lead times for significant inventory cost savings. Operations Research, 70(4):2496–2516.
  55. Quadratic approximation of cost functions in lost sales and perishable inventory control problems. Working paper, Fuqua School of Business, Duke University, Durham, NC.
  56. Reinforcement Learning: An Introduction. MIT Press, 2nd edition.
  57. On-line policy improvement using monte-carlo search. In Mozer, M., Jordan, M., and Petsche, T., editors, Advances in Neural Information Processing Systems, volume 9. MIT Press.
  58. Reinforcement learning with exogenous states and rewards. arXiv preprint arXiv:2303.12957.
  59. Using the proximal policy optimisation algorithm for solving the stochastic capacitated lot sizing problem. International Journal of Production Research, 61(6):1955–1978.
  60. Spare parts inventory control under system availability constraints, volume 227. Springer.
  61. Use of Proximal Policy Optimization for the Joint Replenishment Problem. Computers in Industry, 119:103239.
  62. Xin, L. (2021). Technical note—understanding the performance of capped base-stock policies in lost-sales inventory models. Operations Research, 69(1):61–70.
  63. Optimality gap of constant-order policies decays exponentially in the lead time for lost sales models. Operations Research, 64(6):1556–1565.
  64. Zipkin, P. (2008). Old and new methods for lost-sales inventory systems. Operations Research, 56(5):1256–1263.
  65. Zipkin, P. H. (2000). Foundations of Inventory Management. McGraw-Hill, Boston.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Tarkan Temizöz (2 papers)
  2. Christina Imdahl (3 papers)
  3. Remco Dijkman (14 papers)
  4. Douniel Lamghari-Idrissi (2 papers)
  5. Willem van Jaarsveld (17 papers)
Citations (5)

Summary

We haven't generated a summary for this paper yet.