Reinforcement Learning for Jump-Diffusions, with Financial Applications (2405.16449v2)
Abstract: We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. We investigate as an application the mean--variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps. Finally, we present a detailed study on applying the general theory to option hedging.
- Testing for jumps in noisy high frequency data. Journal of Econometrics 168(2), 207–222.
- An empirical investigation of continuous-time equity return models. The Journal of Finance 57(3), 1239–1284.
- Applebaum, D. (2009). Lévy Processes and Stochastic Calculus. Cambridge University Press.
- Bates, D. S. (1991). The crash of 87: was it expected? The evidence from options markets. The Journal of Finance 46(3), 1009–1044.
- Bates, D. S. (1996). Jumps and stochastic volatility: Exchange rate processes implicit in Deutsche Mark options. The Review of Financial Studies 9(1), 69–107.
- Bender, C. and N. T. Thuan (2023). Entropy-regularized mean-variance portfolio optimization with jumps. arXiv preprint arXiv:2312.13409.
- Cai, N. and S. G. Kou (2011). Option pricing under a mixed-exponential jump diffusion model. Management Science 57(11), 2067–2081.
- Financial Modelling with Jump Processes. Chapman and Hall/CRC.
- Learning equilibrium mean-variance strategy. Mathematical Finance 33(4), 1166–1212.
- Das, S. R. (2002). The surprise element: Jumps in interest rates. Journal of Econometrics 106(1), 27–65.
- Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching. arXiv preprint arXiv:2404.17939.
- Ethier, S. N. and T. G. Kurtz (1986). Markov Processes: Characterization and Convergence. John Wiley & Sons.
- Stochastic resonance. Reviews of Modern Physics 70(1), 223–287.
- State-dependent temperature control for langevin diffusions. SIAM Journal on Control and Optimization 60(3), 1250–1268.
- Jump-diffusion processes as models for neuronal activity. Biosystems 40(1-2), 75–82.
- Abrupt transitions in time series with uncertainties. Nature Communications 9(1), 48–57.
- Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls. SIAM Journal on Control and Optimization 61(2), 755–787.
- Entropy regularization for mean field games with learning. Mathematics of Operations research 47(4), 3239–3260.
- Achieving mean–variance efficiency by continuous-time reinforcement learning. In Proceedings of the Third ACM International Conference on AI in Finance, pp. 377–385.
- Limit Theorems for Stochastic Processes. Springer.
- Jia, Y. and X. Y. Zhou (2022a). Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research 23(154), 1–55.
- Jia, Y. and X. Y. Zhou (2022b). Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research 23(154), 1–55.
- Jia, Y. and X. Y. Zhou (2023). q-learning in continuous time. Journal of Machine Learning Research 24(161), 1–61.
- Kou, S. G. (2002). A jump-diffusion model for option pricing. Management Science 48(8), 1086–1101.
- Kunita, H. (2004). Stochastic differential equations based on Lévy processes and stochastic flows of diffeomorphisms. In M. M. Rao (Ed.), Real and Stochastic Analysis: New Perspectives, pp. 305–373. Birkhäuser.
- Kushner, H. J. (2000). Jump-diffusions with controlled jumps: Existence and numerical methods. Journal of Mathematical Analysis and Applications 249(1), 179–198.
- Characterizing abrupt transitions in stochastic dynamics. New Journal of Physics 20(11), 113043.
- Time-changed Ornstein–Uhlenbeck processes and their applications in commodity derivative models. Mathematical Finance 24(2), 289–330.
- Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics 3(1-2), 125–144.
- Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research 7, 771–791.
- Applied Stochastic Control of Jump Diffusions, Volume 498. Springer.
- Time discretization-invariant safe action repetition for policy gradient methods. Advances in Neural Information Processing Systems 34, 267–279.
- Regularity and stability of feedback relaxed controls. SIAM Journal on Control and Optimization 59(5), 3118–3151.
- Sato, K.-I. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press.
- Situ, R. (2006). Theory of Stochastic Differential Equations with Jumps and Applications. Springer.
- Making deep q-learning methods robust to time discretization. arXiv preprint arXiv:1901.09732.
- Reinforcement learning for continuous-time optimal execution: actor-critic algorithm and error analysis. Available at SSRN 4378950.
- Testing for the presence of jump components in jump diffusion models. Journal of Econometrics 230(2), 483–509.
- Reinforcement learning in continuous time and space: A stochastic control approach. Journal of Machine Learning Research 21(198), 1–34.
- Wang, H. and X. Y. Zhou (2020). Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance 30(4), 1273–1308.
- Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. Journal of Economic Dynamics and Control 158, 104787.
- Continuous-time mean-variance portfolio selection: A stochastic LQ framework. Applied Mathematics and Optimization 42(1), 19–33.
- Xuefeng Gao (28 papers)
- Lingfei Li (10 papers)
- Xun Yu Zhou (33 papers)