SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems (2506.02255v1)

Published 2 Jun 2025 in cs.LG

Abstract: Most existing safe reinforcement learning (RL) benchmarks focus on robotics and control tasks, offering limited relevance to high-stakes domains that involve structured constraints, mixed-integer decisions, and industrial complexity. This gap hinders the advancement and deployment of safe RL in critical areas such as energy systems, manufacturing, and supply chains. To address this limitation, we present SafeOR-Gym, a benchmark suite of nine operations research (OR) environments tailored for safe RL under complex constraints. Each environment captures a realistic planning, scheduling, or control problems characterized by cost-based constraint violations, planning horizons, and hybrid discrete-continuous action spaces. The suite integrates seamlessly with the Constrained Markov Decision Process (CMDP) interface provided by OmniSafe. We evaluate several state-of-the-art safe RL algorithms across these environments, revealing a wide range of performance: while some tasks are tractable, others expose fundamental limitations in current approaches. SafeOR-Gym provides a challenging and practical testbed that aims to catalyze future research in safe RL for real-world decision-making problems. The SafeOR-Gym framework and all accompanying code are available at: https://github.com/li-group/SafeOR-Gym.

Authors (8)

Asha Ramanujam (4 papers)
Adam Elyoumi (1 paper)
Hao Chen (1006 papers)
Sai Madhukiran Kompalli (2 papers)
Akshdeep Singh Ahluwalia (2 papers)
Shraman Pal (4 papers)
Dimitri J. Papageorgiou (9 papers)
Can Li (67 papers)

Summary

SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems

This paper introduces SafeOR-Gym, a benchmark suite specifically designed to address the limitations in safe reinforcement learning (RL) benchmarks with respect to operations research (OR) problems. The existing benchmarks largely focus on robotics and control tasks, which do not incorporate the structured complexities, constraints, and decision-making processes inherent to industrial applications like energy systems, manufacturing, and supply chains. SafeOR-Gym addresses these gaps by offering nine unique OR environments that enable the evaluation and development of RL algorithms under realistic and safety-critical constraints.

The environments provided in SafeOR-Gym simulate real-world OR problems that involve planning, scheduling, and control, all of which are constrained by cost-based violations, planning horizons, and hybrid discrete-continuous action spaces. Importantly, each environment integrates seamlessly with the OmniSafe CMDP interface, which allows the systematic evaluation of RL algorithms that need to optimize policies while adhering to safety constraints.

The environments in SafeOR-Gym represent a diverse array of OR problems, such as Resource Task Networks, Unit Commitment, Multi-period Blending, Multi-echelon Inventory Management, and Grid-Integrated Energy Storage. Each scenario challenges RL algorithms with mixed-integer decision requirements, complex operational constraints, and long-term planning objectives. Exemplifying the structured complexity of OR tasks, these environments are vital for understanding how current RL methodologies perform when applied to realistic, safety-critical tasks.

The paper evaluates state-of-the-art safe RL algorithms, such as Constrained Policy Optimization (CPO), TRPOLag, Penalized Proximal Policy Optimization (P3O), OnCRPO, and DDPGLag, within these environments. These evaluations uncover distinct performance variances, indicating that while some tasks and constraints are tractable, current RL methods exhibit fundamental limitations across complex OR challenges. The results suggest that alternatives, particularly algorithms integrating projections and dual learning mechanisms, provide both computational efficiency and superior adherence to safety constraints.

In terms of implications, SafeOR-Gym offers dual advancements. Practically, it serves as a rigorous benchmark for evaluating and developing RL algorithms poised to enter high-stakes industrial domains. Theoretically, it catalyzes new research directions in RL, encouraging innovations in algorithm design and parameter tuning, particularly for complex constrained environments. Future work might explore automated parameter tuning methodologies, action-constrained RL strategies for constraint handling, or the integration of differentiable constraint satisfaction into neural network architectures.

SafeOR-Gym stands as a valuable contribution, bridging the gap between conventional RL tasks and the reality of industrial OR applications. By facilitating a structured testbed for RL algorithms, it pushes the boundaries of what RL can achieve when faced with safety and complexity, setting the stage for future developments in the intersection of machine learning and operations research.

PDF Markdown

SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems (2506.02255v1)

Summary

SafeOR-Gym: A Benchmark Suite for Safe Reinforcement Learning Algorithms on Practical Operations Research Problems

Related Papers

GitHub

YouTube