Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VC Theory for Inventory Policies (2404.11509v2)

Published 17 Apr 2024 in stat.ML and cs.LG

Abstract: Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalization error of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of ``learning less is more'' in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error an order of magnitude lower than that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Balcan MF (2020) Data-driven algorithm design. arXiv preprint arXiv:2011.07177 .
  2. Ban GY (2020) Confidence intervals for data-driven inventory policies with demand censoring. Operations Research 68(2):309–326.
  3. Ban GY, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Operations Research 67(1):90–108.
  4. Berend D, Kontorovich A (2013) A sharp estimate of the binomial mean absolute deviation with applications. Statistics & Probability Letters 83(4):1254–1259.
  5. Besbes O, Mouchtaki O (2023) How big should your data really be? data-driven newsvendor: learning one sample at a time. Management Science Forthcoming.
  6. Cheung WC, Simchi-Levi D (2019) Sampling-based approximation schemes for capacitated stochastic inventory control models. Mathematics of Operations Research 44(2):668–692.
  7. Elmachtoub AN, Grigas P (2022) Smart “predict, then optimize”. Management Science 68(1):9–26.
  8. Guan X, Mišić VV (2022) Randomized policy optimization for optimal stopping. arXiv preprint arXiv:2203.13446 .
  9. Gupta R, Roughgarden T (2020) Data-driven algorithm design. Communications of the ACM 63(6):87–94.
  10. Har-Peled S (2011) Geometric approximation algorithms (Providence, Rhode Island: American Mathematical Society).
  11. Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Mathematics of Operations Research 34(1):103–123.
  12. Janakiraman G, Roundy RO (2004) Lost-sales problems with stochastic lead times: Convexity results for base-stock policies. Operations Research 52(5):795–803.
  13. Kearns MJ, Schapire RE (1994) Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences 48(3):464–497.
  14. Lykouris T, Vassilvitskii S (2021) Competitive caching with machine learned advice. Journal of the ACM (JACM) 68(4):1–25.
  15. Morgenstern JH, Roughgarden T (2015) On the pseudo-dimension of nearly optimal auctions. Advances in Neural Information Processing Systems 28.
  16. Pollard D (1984) Convergence of stochastic processes (New York: Springer-Verlag).
  17. Shalev-Shwartz S, Ben-David S (2014) Understanding machine learning: From theory to algorithms (New York: Cambridge University Press).
  18. Sizer T (1984) Horace’s Compromise: The Dilemma of the American High School (New York: Houghton Mifflin).
  19. Vapnik V, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability & Its Applications 16(2):264–280.
  20. Vapnik VN (1998) Statistical Learning Theory (New York: Wiley).
  21. Xin L (2021) Understanding the performance of capped base-stock policies in lost-sales inventory models. Operations Research 69(1):61–70.
  22. Xin L, Goldberg DA (2016) Optimality gap of constant-order policies decays exponentially in the lead time for lost sales models. Operations Research 64(6):1556–1565.
  23. Zipkin P (2008) Old and new methods for lost-sales inventory systems. Operations research 56(5):1256–1263.
Citations (5)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com