Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains (2006.04037v1)

Published 7 Jun 2020 in cs.LG, cs.AI, cs.MA, and stat.ML

Abstract: This paper describes the application of reinforcement learning (RL) to multi-product inventory management in supply chains. The problem description and solution are both adapted from a real-world business solution. The novelty of this problem with respect to supply chain literature is (i) we consider concurrent inventory management of a large number (50 to 1000) of products with shared capacity, (ii) we consider a multi-node supply chain consisting of a warehouse which supplies three stores, (iii) the warehouse, stores, and transportation from warehouse to stores have finite capacities, (iv) warehouse and store replenishment happen at different time scales and with realistic time lags, and (v) demand for products at the stores is stochastic. We describe a novel formulation in a multi-agent (hierarchical) reinforcement learning framework that can be used for parallelised decision-making, and use the advantage actor critic (A2C) algorithm with quantised action spaces to solve the problem. Experiments show that the proposed approach is able to handle a multi-objective reward comprised of maximising product sales and minimising wastage of perishable products.

View on arXiv

Authors (6)

Nazneen N Sultana (3 papers)
Hardik Meisheri (15 papers)
Vinita Baniwal (3 papers)
Somjit Nath (12 papers)
Balaraman Ravindran (100 papers)
Harshad Khadilkar (29 papers)

Citations (22)

View on Semantic Scholar

Summary

The paper "Reinforcement Learning for Multi-Product Multi-Node Inventory Management in Supply Chains" explores the application of reinforcement learning (RL) to optimize inventory management within complex supply chains. This work addresses a sophisticated real-world scenario characterized by its multi-product, multi-node nature, posing unique challenges and opportunities for improvement through RL techniques.

Problem Context and Novelty

The research tackles a dynamic and intricate problem involving:

Multiple Products: Managing 50 to 1000 different products sharing limited capacity resources.
Multi-node Structure: Incorporating a supply chain network with a warehouse supplying three distinct stores, reflecting a realistic business model.
Capacity Constraints: Recognizing finite capacities at various points, including warehouses, stores, and transportation links.
Temporal Considerations: Accounting for different replenishment schedules and realistic time delays between warehouse and store operations.
Stochastic Demand: Addressing unpredictable demand patterns at various stores, akin to real-world scenarios.

Methodology

The paper introduces a hierarchical multi-agent reinforcement learning framework, which is innovative in several respects:

Parallelized Decision Making: Utilizes a multi-agent structure to enable concurrent management of the inventory across multiple nodes and products.
Algorithmic Approach: Implements the Advantage Actor Critic (A2C) algorithm, leveraging quantized action spaces to efficiently address the problem's complexity.

Objectives and Outcomes

Key objectives include maximizing product sales while simultaneously minimizing the wastage of perishable goods. This dual objective is addressed through a carefully designed reward function within the RL framework.

The experimental results demonstrate the framework's capability to effectively optimize inventory management under the specified constraints. By enabling better decision-making processes, the approach can significantly improve operational efficiency in multi-product, multi-node supply chains.

This research contributes to the supply chain literature by providing a practical RL-based solution to a complex, real-world inventory management problem, incorporating realistic constraints and objectives.

PDF Markdown

Related Papers

Find Related Papers