Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints (2501.12542v1)

Published 21 Jan 2025 in cs.LG, cs.AI, cs.SY, and eess.SY

Abstract: Existing approaches to enforcing design constraints in Reinforcement Learning (RL) applications often rely on training-time penalties in the reward function or training/inference-time invalid action masking, but these methods either cannot be modified after training, or are limited in the types of constraints that can be implemented. To address this limitation, we propose Reinforcement Learning Constrained Beam Search (RLCBS) for inference-time refinement in combinatorial optimization problems. This method respects flexible, inference-time constraints that support exclusion of invalid actions and forced inclusion of desired actions, and employs beam search to maximize sequence probability for more sensible constraint incorporation. RLCBS is extensible to RL-based planning and optimization problems that do not require real-time solution, and we apply the method to optimize process parameters for a novel modular testbed for paper drying. An RL agent is trained to minimize energy consumption across varying machine speed levels by generating optimal dryer module and air supply temperature configurations. Our results demonstrate that RLCBS outperforms NSGA-II under complex design constraints on drying module configurations at inference-time, while providing a 2.58-fold or higher speed improvement.

Summary

The paper presents a novel RLCBS method for optimizing paper drying parameters under flexible constraints.
It leverages beam search and reinforcement learning to efficiently explore multiple action sequences while enforcing dynamic constraints.
Experimental results indicate energy savings of 1.87% over SQP, coupled with significant computational speed improvements.

Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints

Introduction

This paper presents a novel algorithm named Reinforcement Learning Constrained Beam Search (RLCBS) designed for inference-time refinement in combinatorial optimization problems. It builds upon existing methods that often rely on training-time penalties or invalid action masking, which lack flexibility after training. RLCBS allows for the enforcement of flexible constraints during inference, using beam search to optimize sequence probability, thereby improving constraint incorporation. The applicability of RLCBS is demonstrated in optimizing the process parameters of a modular drying testbed for paper production, showcasing significant computational speed advantages over traditional methods like NSGA-II.

Methodology

Reinforcement Learning Constrained Beam Search (RLCBS)

RLCBS leverages the scalable and structured beam search methodology, commonly used in NLP, and adapts it for RL applications with discrete action spaces. Beam search enables parallel exploration of multiple action sequence hypotheses, enhancing the probability of finding optimal solutions. Specifically, RLCBS maintains constraints through a dynamic allocation mechanism that balances beam exploration while ensuring constraints are met efficiently. This implementation draws from the Huggingface Transformers library, accommodating lexically constrained decoding with adjustments for RL contexts.

Figure 1: Schematic showing one step in RLCBS. We start with $n_b$ beams, each represented by action-state sequence $\{s_0,a_0,... s_{t-1}$ .

Smart Dryer Simulation Environment

The Smart Dryer testbed utilized for evaluation is designed with modularity and reconfigurable elements, enabling extensive experimentation. The simulation model implements a validated physics-based drying process model that accurately represents real-world drying dynamics. It incorporates boundary conditions and varying drying technologies (e.g., IR heating modules), with experimental validation showing high fidelity to actual drying outcomes.

Figure 2: Section view of the Smart Dryer testbed. The chamber accommodates various modules with IR emitters assisting drying.

Practical Considerations

To maximize RLCBS applicability in real-world scenarios, caching strategies were employed to optimize computational efficiency. The overhead from simulating multiple beam candidates is mitigated through a Redis-based caching system that stores simulated results, thereby minimizing redundant computations.

Experimental Setup

The RL model was trained using Proximal Policy Optimization (PPO), optimizing parameters like dryer module configuration and air supply temperature. The reward function incentivizes energy-efficient drying outcomes compared to a sequential quadratic programming (SQP) baseline. Constraints introduced during RLCBS inference ensure optimal process parameters while satisfying practical constraints such as module usage balance and temperature continuity.

Results

RLCBS demonstrated substantial improvements over NSGA-II, achieving similar or better performance with significant speed advantages. Under constraints, RLCBS reflected energy savings of around 1.87% versus the SQP baseline—a notable achievement given the already optimized baseline conditions.

Figure 3: Sample paper temperature and dry-basis moisture content (DBMC) trajectory as simulated by the physics-based drying model using dryer configuration: 6x SJR, 6x SP.

Discussion

The constrained beam search method exhibited strategic allocation of dryer modules and temperature settings, adapting dynamically to varying operational constraints. Its superiority in constrained optimization highlights potential for broader application in manufacturing processes requiring flexible yet optimized operational configurations.

Conclusion

RLCBS provides an advanced approach for integrating flexible constraints into RL-based optimization, supported by robust theoretical modeling and efficient computational strategies. Its application to paper drying demonstrates significant potential for enhancing operational efficiency in industrial settings, paving the way for future exploration in constraint-rich environments. The release of RLCBS as an open-source extension invites further research and adoption across diverse optimization challenges.