- The paper presents a novel RLCBS method for optimizing paper drying parameters under flexible constraints.
- It leverages beam search and reinforcement learning to efficiently explore multiple action sequences while enforcing dynamic constraints.
- Experimental results indicate energy savings of 1.87% over SQP, coupled with significant computational speed improvements.
Reinforcement Learning Constrained Beam Search for Parameter Optimization of Paper Drying Under Flexible Constraints
Introduction
This paper presents a novel algorithm named Reinforcement Learning Constrained Beam Search (RLCBS) designed for inference-time refinement in combinatorial optimization problems. It builds upon existing methods that often rely on training-time penalties or invalid action masking, which lack flexibility after training. RLCBS allows for the enforcement of flexible constraints during inference, using beam search to optimize sequence probability, thereby improving constraint incorporation. The applicability of RLCBS is demonstrated in optimizing the process parameters of a modular drying testbed for paper production, showcasing significant computational speed advantages over traditional methods like NSGA-II.
Methodology
Reinforcement Learning Constrained Beam Search (RLCBS)
RLCBS leverages the scalable and structured beam search methodology, commonly used in NLP, and adapts it for RL applications with discrete action spaces. Beam search enables parallel exploration of multiple action sequence hypotheses, enhancing the probability of finding optimal solutions. Specifically, RLCBS maintains constraints through a dynamic allocation mechanism that balances beam exploration while ensuring constraints are met efficiently. This implementation draws from the Huggingface Transformers library, accommodating lexically constrained decoding with adjustments for RL contexts.
Figure 1: Schematic showing one step in RLCBS. We start with nb​ beams, each represented by action-state sequence {s0​,a0​,...st−1​.
Smart Dryer Simulation Environment
The Smart Dryer testbed utilized for evaluation is designed with modularity and reconfigurable elements, enabling extensive experimentation. The simulation model implements a validated physics-based drying process model that accurately represents real-world drying dynamics. It incorporates boundary conditions and varying drying technologies (e.g., IR heating modules), with experimental validation showing high fidelity to actual drying outcomes.
Figure 2: Section view of the Smart Dryer testbed. The chamber accommodates various modules with IR emitters assisting drying.
Practical Considerations
To maximize RLCBS applicability in real-world scenarios, caching strategies were employed to optimize computational efficiency. The overhead from simulating multiple beam candidates is mitigated through a Redis-based caching system that stores simulated results, thereby minimizing redundant computations.
Experimental Setup
The RL model was trained using Proximal Policy Optimization (PPO), optimizing parameters like dryer module configuration and air supply temperature. The reward function incentivizes energy-efficient drying outcomes compared to a sequential quadratic programming (SQP) baseline. Constraints introduced during RLCBS inference ensure optimal process parameters while satisfying practical constraints such as module usage balance and temperature continuity.
Results
RLCBS demonstrated substantial improvements over NSGA-II, achieving similar or better performance with significant speed advantages. Under constraints, RLCBS reflected energy savings of around 1.87% versus the SQP baseline—a notable achievement given the already optimized baseline conditions.
Figure 3: Sample paper temperature and dry-basis moisture content (DBMC) trajectory as simulated by the physics-based drying model using dryer configuration: 6x SJR, 6x SP.
Discussion
The constrained beam search method exhibited strategic allocation of dryer modules and temperature settings, adapting dynamically to varying operational constraints. Its superiority in constrained optimization highlights potential for broader application in manufacturing processes requiring flexible yet optimized operational configurations.
Conclusion
RLCBS provides an advanced approach for integrating flexible constraints into RL-based optimization, supported by robust theoretical modeling and efficient computational strategies. Its application to paper drying demonstrates significant potential for enhancing operational efficiency in industrial settings, paving the way for future exploration in constraint-rich environments. The release of RLCBS as an open-source extension invites further research and adoption across diverse optimization challenges.