Physical interpretation of DRL control actions for reducing a turbulent separation bubble

Characterize the physical mechanism underlying the actuator actions predicted by a proximal policy optimization–based deep reinforcement learning agent that reduce the length of the turbulent separation bubble in a suction–blowing–induced adverse-pressure-gradient turbulent boundary layer, as simulated with the SOD2D solver using six rectangular wall-normal synthetic-jet actuators grouped into three spanwise mass-conserving pairs and a reward proportional to the recirculation-length reduction.

Background

The study compares classical zero-net-mass-flux periodic forcing with deep reinforcement learning (DRL) to reduce a turbulent separation bubble (TSB) formed in an adverse-pressure-gradient turbulent boundary layer generated by suction–blowing at the domain top boundary. Six rectangular wall-normal actuators are placed upstream of the separation region; for DRL they are paired spanwise to enforce instantaneous mass conservation. The DRL agent uses the proximal policy optimization algorithm, with state observations from a grid of streamwise-velocity probes and a reward based on the characteristic recirculation length.

Training on a coarse LES grid yields a 25.3% reduction of the bubble length with smooth actuator signals, outperforming periodic forcing (15.7% reduction). Despite these promising results, the authors explicitly state that they cannot yet derive a physical interpretation of the learned DRL actions, leaving open the task of elucidating the mechanisms by which the agent achieves control.

References

A physical interpretation of the DRL actions cannot be derived from the current results yet, and a thorough assessment of the control strategy learnt by the DRL agent will be considered in future work.

Active flow control of a turbulent separation bubble through deep reinforcement learning  (2403.20295 - Font et al., 2024) in Section 3.2 (DRL control)