Double Deep Q-Learning for Optimal Execution

Published 17 Dec 2018 in q-fin.TR, cs.LG, q-fin.CP, and stat.ML | (1812.06600v2)

Abstract: Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (72)

View on Semantic Scholar

Summary

Double Deep Q-Learning for Optimal Execution

This paper explores the application of reinforcement learning, specifically Double Deep Q-Learning (DDQN), to the problem of optimal trade execution. The research departs from traditional stochastic control methods and instead adopts a model-free approach using DDQN to address the complexities inherent in financial trading scenarios, such as high-dimensional action spaces and asymmetric information.

Summary of Approach

The authors introduce a fully connected neural network trained using experience replay and DDQN. This framework estimates optimal trading actions without imposing strong model assumptions. The process involves state inputs derived from the current limit order book and pertinent trading signals, which are used to inform the decision-making process of the DDQN model. The output is a Q-value function predicting the future rewards of different trading actions.

Key innovations of this study include:
- Experience Replay: Used to enhance the stability of training, mitigating the impact of outlier signals which might distort the neural network's learning process.
- Double Q-Learning: This variant mitigates the overestimation bias typical in Q-Learning by utilizing two separate networks: one for selecting actions and another for evaluating them.
- Handling Constraints: The paper innovatively manages the constraint of inventory liquidation by integrating an additional one-second time step at the end of the trading horizon.

Strong Results and Claims

The empirical evaluation demonstrates the model's performance across nine stock datasets, revealing significant out-performance over the TWAP benchmark, primarily in stocks like INTC, MSFT, NTAP, and VOD. Metrics utilized to substantiate these findings include mean and median returns, gain-loss ratios, and the probability of outperforming benchmarks. Specifically, stocks like INTC show a mean relative Profit and Loss (P&L) improvement in the double digits, demonstrating the efficacy of the DDQN paradigm in dynamic and stochastic environments.

Implications and Future Work

The practical implications of this research are profound for trading institutions and algorithm designers, offering a novel tool to maximize long-term trading rewards in complex environments. Theoretically, this work reinforces the potential of reinforcement learning in financial contexts, presenting a viable alternative to traditional stochastic control methods with constrained analytical feasibility.

In looking forward, several directions emerge:
- Extended Feature Set: Incorporating more comprehensive data inputs, such as historical price patterns, could enhance robustness and adaptability.
- Broader Stock Application: Testing the model across a wider array of financial instruments would expand its validity and scope.
- Mixed Order Strategy: Incorporating different types of orders, such as limit orders, could refine the model's applicability in real-world trading scenarios, simulating more realistic market conditions.

By bridging concepts from reinforcement learning and financial modeling, this paper lays groundwork for further exploration of AI-based strategies in high-frequency trading and deep learning finance solutions.