Learning an Inventory Control Policy with General Inventory Arrival Dynamics (2310.17168v2)
Abstract: In this paper we address the problem of learning and backtesting inventory control policies in the presence of general arrival dynamics -- which we term as a quantity-over-time arrivals model (QOT). We also allow for order quantities to be modified as a post-processing step to meet vendor constraints such as order minimum and batch size constraints -- a common practice in real supply chains. To the best of our knowledge this is the first work to handle either arbitrary arrival dynamics or an arbitrary downstream post-processing of order quantities. Building upon recent work (Madeka et al., 2022) we similarly formulate the periodic review inventory control problem as an exogenous decision process, where most of the state is outside the control of the agent. Madeka et al., 2022 show how to construct a simulator that replays historic data to solve this class of problem. In our case, we incorporate a deep generative model for the arrivals process as part of the history replay. By formulating the problem as an exogenous decision process, we can apply results from Madeka et al., 2022 to obtain a reduction to supervised learning. Via simulation studies we show that this approach yields statistically significant improvements in profitability over production baselines. Using data from a real-world A/B test, we show that Gen-QOT generalizes well to off-policy data and that the resulting buying policy outperforms traditional inventory management systems in real world settings.
- Loss of plasticity in continual deep reinforcement learning. arXiv:2303.07507.
- Neural inventory control in networks via hindsight differentiable policy optimization. arXiv:2306.11246.
- Studies in the mathematical theory of inventory and production. Stanford University Press.
- Belief movement, uncertainty reduction, and rational updating. Tech. rep., Haas School of Business, University of California, Berkeley.
- Orl: Reinforcement learning benchmarks for online stochastic optimization problems. arXiv:1911.10641.
- Bao, Y. (2006). Supply chain competition. Tech. rep., UNSW Sydney. [PDF]
- Myopic heuristics for the random yield problem. Operations Research 47 713–722.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan and H. Lin, eds.), vol. 33. Curran Associates, Inc.
- The effects of load smoothing on inventory levels in a capacitated production and inventory system. Tech. rep., Cornell University Operations Research and Industrial Engineering. [PDF]
- Model-augmented actor-critic: Backpropagating through paths. In ICLR.
- A newsvendor’s procurement problem when suppliers are unreliable. Manufacturing & Service Operations Management 9 9–32.
- Solving semi-markov decision problems using average reward reinforcement learning. Management Science 45 560–574.
- Dawid, A. (1982). The well calibrated bayesian. Journal of the American Statistical Association 77 605–613.
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL-HLT.
- Sample-efficient reinforcement learning in the presence of exogenous information. arXiv:2206.04282.
- Sparsity in partially controllable linear systems. In International Conference on Machine Learning. PMLR.
- MQTransformer: Multi-Horizon Forecasts with Context Dependent and Feedback-Aware Attention. arXiv:2009.14799.
- An inventory model with limited production capacity and uncertain demands i. the average-cost criterion. Mathematics of Operations Research 11 193–207.
- Reducing the cost of demand uncertainty through accurate response to early sales. Operations research 44 87–99.
- Threshold Martingales and the Evolution of Forecasts. arXiv:2105.06834.
- Calibrated learning and correlated equilibrium. Games and Economic Behavior 21 40–55.
- Deep Learning for Time Series Forecasting: The Electric Load Case. arXiv:1907.09207.
- Periodic review production models with variable yield and uncertain demand. Iie Transactions 20 144–150.
- Inventory management in supply chains: a reinforcement learning approach. International Journal of Production Economics 78 153–161.
- Can deep reinforcement learning improve inventory management? performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing & Service Operations Management 24 1349–1368.
- Generative adversarial nets. In Advances in Neural Information Processing Systems (Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence and K. Weinberger, eds.), vol. 27. Curran Associates, Inc.
- Graves, A. (2012). Sequence transduction with recurrent neural networks. arXiv:1211.3711.
- Graves, A. (2013). Generating sequences with recurrent neural networks. arXiv:1308.0850.
- The structure of periodic review policies in the presence of random yield. Operations Research 38 634–643.
- Chainqueen: A real-time differentiable physical simulator for soft robotics. In 2019 International conference on robotics and automation (ICRA). IEEE.
- Learning protein structure with a differentiable simulator. In International Conference on Learning Representations.
- Forecasting with trees. International Journal of Forecasting 38 1473–1481. Special Issue: M5 competition.
- Kaplan, R. S. (1970). A dynamic inventory model with stochastic lead times. Management Science 16 491–507.
- Single item inventory control under periodic review and a minimum order quantity. International Journal of Production Economics 133 280–285.
- Auto-encoding variational bayes. arXiv:1312.6114.
- An introduction to variational autoencoders. arXiv:1906.02691.
- A periodic-review inventory system with supply interruptions. Probability in the Engineering and Informational Sciences 18 33–53.
- Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. arXiv:1912.09363.
- Economic order quantity for items with imperfect quality: Revisited. International Journal of Production Economics 112 808–815. Special Section on RFID: Technology, Applications, and Impact on Business Operations.
- Sample path generation for probabilistic demand forecasting. In KDD 2018 Workshop on Mining and Learning from Time Series.
- Deep inventory management. arXiv:2210.03137.
- Multi-echelon inventory management for a non-stationary capacitated distribution network. Tech. rep., SSRN. [PDF]
- Asynchronous methods for deep reinforcement learning. arXiv:1602.01783.
- Playing atari with deep reinforcement learning. arXiv:1312.5602.
- An analysis of multi-agent reinforcement learning for decentralized inventory control systems. arXiv:2307.11432.
- Sequence to sequence deep learning models for solar irradiation forecasting. In IEEE Milan PowerTech.
- Nahmias, S. (1979). Simple approximations for a variety of dynamic leadtime lost-sales inventory models. Operations Research 27 904–924.
- STConvS2S: Spatiotemporal Convolutional Sequence to Sequence Network for weather forecasting. arXiv:1912.00134.
- Model-based reinforcement learning with scalable composite policy gradient estimators. In ICML.
- Porteus, E. L. (2002). Foundations of stochastic inventory theory. Stanford University Press.
- A practical end-to-end inventory management model with deep learning. Management Science 69 759–773.
- Inventory management with periodic ordering and minimum order quantities. Journal of the Operational Research Society 49 1085–1094.
- Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting 36 1181–1191.
- Proximal policy optimization algorithms. arXiv:1707.06347.
- A two-echelon inventory system with a minimum order quantity requirement. Sustainability 11.
- Mastering the game of go with deep neural networks and tree search. Nature 529 484–489.
- Hindsight learning for MDPs with exogenous inputs. In Proceedings of the 40th International Conference on Machine Learning, vol. 202 of Proceedings of Machine Learning Research. PMLR.
- Inventory control with information about supply conditions. Management Science 42 1409–1419.
- Do differentiable simulators give better policy gradients? In International Conference on Machine Learning. PMLR.
- LSTM Neural Networks for Language Modeling. In INTERSPEECH.
- Reinforcement Learning: An iIntroduction. MIT press.
- Szepesvári, C. (2010). Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, Morgan & Claypool Publishers.
- Taleb, N. N. (2018). Election predictions as martingales: an arbitrage approach. Quantitative Finance 18 1–5.
- All roads lead to quantitative finance. Quantitative Finance 19 1775–1776.
- THOMAS, J. D. (2023). Towards cooperative marl in industrial domains .
- Wavenet: A generative model for raw audio. arXiv:1609.03499.
- Pixel recurrent neural networks. In International conference on machine learning. PMLR.
- Veinott, A. F. (1965). The optimal inventory policy for batch ordering. Operations Research 13 424–432.
- Deep Generative Quantile-Copula Models for Probabilistic Forecasting. In ICML Time Series Workshop.
- A multi-horizon quantile recurrent forecaster. In NIPS Time Series Workshop.
- A learning algorithm for continually running fully recurrent neural networks. Neural Computation 1 270–280.
- Long-term Forecasting using Higher Order Tensor RNNs. arXiv:1711.00073.
- Policy optimization for continuous reinforcement learning. arXiv:2305.18901.
- On the structure of optimal ordering policies for stochastic inventory systems with minimum order quantity. Probability in the Engineering and Informational Sciences 20 257–270.
- Effective control policies for stochastic inventory systems with a minimum order quantity and linear costs. International Journal of Production Economics 106 523–531.
- Zhu, H. (2022). A simple heuristic policy for stochastic inventory systems with both minimum and maximum order quantity requirements. Annals of Operations Research 309 347–363.
- Effective inventory control policies with a minimum order quantity and batch ordering. International Journal of Production Economics 168 21–30.
- Zipkin, P. (2008). Old and new methods for lost-sales inventory systems. Operations research 56 1256–1263.