Cause of conservative CVODE usage by the RL policy in high-pressure 0D cases

Determine whether the reinforcement-learning-based solver-selection policy that switches between CVODE and the alpha–QSS integrators in detailed combustion ODE integration uses CVODE conservatively in high-pressure zero-dimensional homogeneous reactor cases because later ignition dynamics are genuinely highly sensitive to early-stage accuracy, or instead because of residual suboptimality arising from training stochasticity or data imbalance in the policy learning process.

Background

The paper presents a reinforcement-learning (RL) framework that adaptively selects between an implicit BDF integrator (CVODE) and a quasi-steady-state (QSS) method for stiff chemical kinetics. In 0D homogeneous reactor evaluations, the policy generally switches to CVODE near ignition regions and uses QSS elsewhere to reduce cost.

However, at a high-pressure condition (750 K, 60 atm), the policy selects CVODE for most of the trajectory, despite a cursory assessment suggesting QSS might suffice during the early induction period. The authors explicitly state uncertainty about whether this conservative behavior is due to genuine sensitivity of later ignition dynamics to early-stage accuracy or due to training-related suboptimality (e.g., stochasticity or data balance). Clarifying this would inform whether improved training or diagnostics could reduce unnecessary CVODE use without sacrificing accuracy.

References

In this case, a cursory evaluation of the profiles suggest that QSS may be sufficient for much of the early induction period, and it is unclear whether the conservative behavior of the RL agent reflects strong sensitivity of later ignition dynamics to early-stage accuracy, or residual suboptimality arising from training stochasticity or data balance issues.

Autonomous Adaptive Solver Selection for Chemistry Integration via Reinforcement Learning  (2604.00264 - Ikponmwoba et al., 31 Mar 2026) in Section 3.1, Solution Accuracy and Learned Solver-Selection Strategy (discussion of Condition 4 / Fig. 4d)