Cause of conservative CVODE usage by the RL policy in high-pressure 0D cases
Determine whether the reinforcement-learning-based solver-selection policy that switches between CVODE and the alpha–QSS integrators in detailed combustion ODE integration uses CVODE conservatively in high-pressure zero-dimensional homogeneous reactor cases because later ignition dynamics are genuinely highly sensitive to early-stage accuracy, or instead because of residual suboptimality arising from training stochasticity or data imbalance in the policy learning process.
References
In this case, a cursory evaluation of the profiles suggest that QSS may be sufficient for much of the early induction period, and it is unclear whether the conservative behavior of the RL agent reflects strong sensitivity of later ignition dynamics to early-stage accuracy, or residual suboptimality arising from training stochasticity or data balance issues.