RL-based optimal execution under more complex liquidity dynamics
Determine whether reinforcement learning approaches, such as Double Deep Q-Learning, can learn optimal execution policies that minimise Implementation Shortfall when market frictions are modelled beyond linear Almgren–Chriss impacts, specifically in settings with (i) non-linear temporary market impact, (ii) transient impact kernels as in the Transient Impact Model and its extensions, and (iii) time-varying bid–ask spread dynamics that contribute to transaction costs.
References
It is certainly interesting to explore if RL based approaches are capable to find optimal trading schedules also in these more complex and realistic environments. We leave these interesting questions for future research.
— Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying
(2402.12049 - Macrì et al., 19 Feb 2024) in Section 4 (Conclusions), final paragraph