Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 160 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 41 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 417 tok/s Pro
Claude Sonnet 4.5 39 tok/s Pro
2000 character limit reached

Double Deep Q-Learning for Optimal Execution (1812.06600v2)

Published 17 Dec 2018 in q-fin.TR, cs.LG, q-fin.CP, and stat.ML

Abstract: Optimal trade execution is an important problem faced by essentially all traders. Much research into optimal execution uses stringent model assumptions and applies continuous time stochastic control to solve them. Here, we instead take a model free approach and develop a variation of Deep Q-Learning to estimate the optimal actions of a trader. The model is a fully connected Neural Network trained using Experience Replay and Double DQN with input features given by the current state of the limit order book, other trading signals, and available execution actions, while the output is the Q-value function estimating the future rewards under an arbitrary action. We apply our model to nine different stocks and find that it outperforms the standard benchmark approach on most stocks using the measures of (i) mean and median out-performance, (ii) probability of out-performance, and (iii) gain-loss ratios.

Citations (72)

Summary

Double Deep Q-Learning for Optimal Execution

This paper explores the application of reinforcement learning, specifically Double Deep Q-Learning (DDQN), to the problem of optimal trade execution. The research departs from traditional stochastic control methods and instead adopts a model-free approach using DDQN to address the complexities inherent in financial trading scenarios, such as high-dimensional action spaces and asymmetric information.

Summary of Approach

The authors introduce a fully connected neural network trained using experience replay and DDQN. This framework estimates optimal trading actions without imposing strong model assumptions. The process involves state inputs derived from the current limit order book and pertinent trading signals, which are used to inform the decision-making process of the DDQN model. The output is a Q-value function predicting the future rewards of different trading actions.

Key innovations of this paper include:

  • Experience Replay: Used to enhance the stability of training, mitigating the impact of outlier signals which might distort the neural network's learning process.
  • Double Q-Learning: This variant mitigates the overestimation bias typical in Q-Learning by utilizing two separate networks: one for selecting actions and another for evaluating them.
  • Handling Constraints: The paper innovatively manages the constraint of inventory liquidation by integrating an additional one-second time step at the end of the trading horizon.

Strong Results and Claims

The empirical evaluation demonstrates the model's performance across nine stock datasets, revealing significant out-performance over the TWAP benchmark, primarily in stocks like INTC, MSFT, NTAP, and VOD. Metrics utilized to substantiate these findings include mean and median returns, gain-loss ratios, and the probability of outperforming benchmarks. Specifically, stocks like INTC show a mean relative Profit and Loss (P&L) improvement in the double digits, demonstrating the efficacy of the DDQN paradigm in dynamic and stochastic environments.

Implications and Future Work

The practical implications of this research are profound for trading institutions and algorithm designers, offering a novel tool to maximize long-term trading rewards in complex environments. Theoretically, this work reinforces the potential of reinforcement learning in financial contexts, presenting a viable alternative to traditional stochastic control methods with constrained analytical feasibility.

In looking forward, several directions emerge:

  • Extended Feature Set: Incorporating more comprehensive data inputs, such as historical price patterns, could enhance robustness and adaptability.
  • Broader Stock Application: Testing the model across a wider array of financial instruments would expand its validity and scope.
  • Mixed Order Strategy: Incorporating different types of orders, such as limit orders, could refine the model's applicability in real-world trading scenarios, simulating more realistic market conditions.

By bridging concepts from reinforcement learning and financial modeling, this paper lays groundwork for further exploration of AI-based strategies in high-frequency trading and deep learning finance solutions.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com