Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

OR-Gym: A Reinforcement Learning Library for Operations Research Problems (2008.06319v2)

Published 14 Aug 2020 in cs.AI and cs.LG

Abstract: Reinforcement learning (RL) has been widely applied to game-playing and surpassed the best human-level performance in many domains, yet there are few use-cases in industrial or commercial settings. We introduce OR-Gym, an open-source library for developing reinforcement learning algorithms to address operations research problems. In this paper, we apply reinforcement learning to the knapsack, multi-dimensional bin packing, multi-echelon supply chain, and multi-period asset allocation model problems, as well as benchmark the RL solutions against MILP and heuristic models. These problems are used in logistics, finance, engineering, and are common in many business operation settings. We develop environments based on prototypical models in the literature and implement various optimization and heuristic models in order to benchmark the RL results. By re-framing a series of classic optimization problems as RL tasks, we seek to provide a new tool for the operations research community, while also opening those in the RL community to many of the problems and challenges in the OR field.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Christian D. Hubbs (1 paper)
  2. Hector D. Perez (4 papers)
  3. Owais Sarwar (2 papers)
  4. Nikolaos V. Sahinidis (13 papers)
  5. Ignacio E. Grossmann (13 papers)
  6. John M. Wassick (1 paper)
Citations (67)

Summary

Overview of OR-Gym: A Reinforcement Learning Library for Operations Research Problems

This paper presents OR-Gym, an open-source reinforcement learning (RL) library designed for addressing operations research (OR) problems. The library reframes classical optimization tasks like knapsack, multi-dimensional bin packing, multi-echelon supply chain, and multi-period asset allocation as RL environments, facilitating exploration for both OR and RL communities.

Methodological Approach

OR-Gym leverages the popular OpenAI Gym interface, integrating traditional OR problems with RL methodologies. By structuring these problems as Markov Decision Processes (MDPs), the authors enable sequential decision-making frameworks adaptable for RL models. The library includes benchmarks against mixed-integer linear programming (MILP) and heuristic methods, highlighting the adaptability of RL in these contexts. Proximal Policy Optimization (PPO) is employed as the primary RL algorithm, demonstrating its versatility across different environments.

Key Results

Knapsack Problem

In the knapsack variants, RL competes well against traditional MILP and heuristics in deterministic contexts but shows superior performance in the stochastic, online scenario. This indicates RL’s potential in handling uncertainty where conventional heuristics might struggle.

Virtual Machine Packing

For virtual machine packing, incorporating action masking within the RL setup significantly improves performance, reducing the search space and aligning closely with optimal solutions. This reinforcement underlines RL’s efficiency in environments with strict constraints.

Supply Chain Inventory Management

The multi-echelon supply chain problem highlights RL's ability to discover dynamic reordering policies that outperform static ones. However, RL still trails behind the shrinking horizon model (SHLP), where the latter leverages prior probabilistic knowledge of demand.

Asset Allocation

The multi-period asset allocation task reveals that while RL models excel in scenarios maximizing expected returns, robust optimization offers superior downside protection. The trade-off between reward potential and risk aversion is a critical decision-making aspect in financial environments.

Implications and Future Directions

The implementation of OR-Gym introduces a scalable tool for both academic research and practical applications, bridging RL and OR domains. This work sets a foundation for further cross-disciplinary investigations, particularly in integrating RL with robust and stochastic optimization techniques. Additionally, the potential for hybrid RL and mathematical programming approaches could be explored to enhance solution quality and expedite computational processes.

Conclusion

The OR-Gym library illustrates the applicability of RL in traditional OR problems, showcasing promising results, particularly under uncertainty. As AI continues to evolve, this intersection of RL and OR could lead to the development of more nuanced and efficient methodologies for solving complex industrial and operational challenges. This paper lays essential groundwork for future exploration and integration of RL frameworks in diverse OR contexts.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com