RL-based optimal execution under more complex liquidity dynamics

Determine whether reinforcement learning approaches, such as Double Deep Q-Learning, can learn optimal execution policies that minimise Implementation Shortfall when market frictions are modelled beyond linear Almgren–Chriss impacts, specifically in settings with (i) non-linear temporary market impact, (ii) transient impact kernels as in the Transient Impact Model and its extensions, and (iii) time-varying bid–ask spread dynamics that contribute to transaction costs.

Background

The paper demonstrates that a Double Deep Q-Learning (DDQL) agent can learn optimal or near-optimal liquidation strategies in Almgren–Chriss environments with linear temporary and permanent impacts, including deterministic trends and stochastic square-root mean-reverting dynamics. In several cases where closed-form or approximate solutions exist, the DDQL agent matches or surpasses these benchmarks without prior knowledge of model parameters.

In the conclusions, the authors outline extensions toward more realistic liquidity modeling: non-linear temporary impact, transient impact models, and additional cost dimensions such as time-varying bid–ask spread. They explicitly raise the question of whether RL-based methods can still discover optimal trading schedules in these richer settings and state that they leave these questions for future research.

References

It is certainly interesting to explore if RL based approaches are capable to find optimal trading schedules also in these more complex and realistic environments. We leave these interesting questions for future research.

— Reinforcement Learning for Optimal Execution when Liquidity is Time-Varying (2402.12049 - Macrì et al., 19 Feb 2024) in Section 4 (Conclusions), final paragraph

RL-based optimal execution under more complex liquidity dynamics

Background

References

Related Problems