Market Making via Reinforcement Learning
This paper presents a paper on using reinforcement learning (RL) to improve market-making strategies within high-frequency trading environments. Authored by Thomas Spooner, John Fearnley, Rahul Savani, and Andreas Koukorinis, the work elaborates on the development of a market-making agent that leverages temporal-difference RL. Their research centers around a sophisticated simulation of limit order book (LOB) markets to train RL models effectively.
Background and Objectives
Market making is a pivotal activity in financial markets, contributing to liquidity by constantly providing buy and sell quotes for securities. The task is complex, primarily due to inventory risk – the danger of accumulating an undesirable net position. Automation in market making has surged, as electronic LOB systems require rapid data handling and decision-making, which surpass human capabilities.
This paper aims to create a market-making RL agent that outperforms simple benchmarks and compares favorably with a recent online learning approach by Abernethy et al. The authors deploy a data-driven simulator to model realistic trading scenarios, capturing nuances like order cancellation and market impact.
Methodology
The authors implement a temporal-difference learning framework, employing linear tile-coding for value function approximation. The research focuses on several core areas: designing a reward function to manage inventory risk, comparing RL algorithms, and evaluating different state-space representations.
Their simulation is constructed from high-frequency market data, offering a rich testbed for training and evaluating market making strategies. The action space for the RL agent includes different quoting strategies and inventory-clearing mechanisms, simulated based on historical LOB data.
Key Findings
The paper presents various experiments:
- Learning Algorithms: Among the algorithms evaluated, SARSA with eligibility traces demonstrated the most consistent performance across different securities. Off-policy algorithms such as Q-learning and its variants showed higher volatility and tended to underperform in out-of-sample scenarios.
- Reward Functions: The incorporation of asymmetrically dampened profit and loss (PnL) functions significantly stabilized learning and improved agent performance. This modification targeted the speculative nature of inventory positions, promoting more consistent spread capture.
- State Representation: With a linear combination of tile codings (LCTC), the authors managed to balance expressivity and learning stability, effectively integrating agent and market states in their model.
The consolidated agent, combining these enhancements, proved superior in risk-adjusted performance relative to benchmarks and previous models.
Implications and Future Directions
The findings indicate that RL can indeed enhance market making efficiency by developing strategies that exhibit consistent spread capture with controlled inventory risk. Practical implications include the potential deployment of such RL-based agents in live trading environments, given their robust empirical performance in the simulated market.
Future research could extend this framework to incorporate:
- Advanced RL algorithms like policy gradient methods and deep RL using recurrent neural networks, which may enhance the agent's ability to process sequential data.
- Parametrized action spaces for greater flexibility in trading strategies.
- Realistic market constraints like latency and transaction costs to refine the agent's adaptability to real-world trading conditions.
This paper contributes to the growing body of research in AI-driven trading strategies, highlighting RL's capacity to adapt and optimize complex market-making tasks.