Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Reinforcement Learning for Jump-Diffusions, with Financial Applications (2405.16449v2)

Published 26 May 2024 in cs.LG, math.OC, and q-fin.MF

Abstract: We study continuous-time reinforcement learning (RL) for stochastic control in which system dynamics are governed by jump-diffusion processes. We formulate an entropy-regularized exploratory control problem with stochastic policies to capture the exploration--exploitation balance essential for RL. Unlike the pure diffusion case initially studied by Wang et al. (2020), the derivation of the exploratory dynamics under jump-diffusions calls for a careful formulation of the jump part. Through a theoretical analysis, we find that one can simply use the same policy evaluation and $q$-learning algorithms in Jia and Zhou (2022a, 2023), originally developed for controlled diffusions, without needing to check a priori whether the underlying data come from a pure diffusion or a jump-diffusion. However, we show that the presence of jumps ought to affect parameterizations of actors and critics in general. We investigate as an application the mean--variance portfolio selection problem with stock price modelled as a jump-diffusion, and show that both RL algorithms and parameterizations are invariant with respect to jumps. Finally, we present a detailed study on applying the general theory to option hedging.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. Testing for jumps in noisy high frequency data. Journal of Econometrics 168(2), 207–222.
  2. An empirical investigation of continuous-time equity return models. The Journal of Finance 57(3), 1239–1284.
  3. Applebaum, D. (2009). Lévy Processes and Stochastic Calculus. Cambridge University Press.
  4. Bates, D. S. (1991). The crash of 87: was it expected? The evidence from options markets. The Journal of Finance 46(3), 1009–1044.
  5. Bates, D. S. (1996). Jumps and stochastic volatility: Exchange rate processes implicit in Deutsche Mark options. The Review of Financial Studies 9(1), 69–107.
  6. Bender, C. and N. T. Thuan (2023). Entropy-regularized mean-variance portfolio optimization with jumps. arXiv preprint arXiv:2312.13409.
  7. Cai, N. and S. G. Kou (2011). Option pricing under a mixed-exponential jump diffusion model. Management Science 57(11), 2067–2081.
  8. Financial Modelling with Jump Processes. Chapman and Hall/CRC.
  9. Learning equilibrium mean-variance strategy. Mathematical Finance 33(4), 1166–1212.
  10. Das, S. R. (2002). The surprise element: Jumps in interest rates. Journal of Econometrics 106(1), 27–65.
  11. Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching. arXiv preprint arXiv:2404.17939.
  12. Ethier, S. N. and T. G. Kurtz (1986). Markov Processes: Characterization and Convergence. John Wiley & Sons.
  13. Stochastic resonance. Reviews of Modern Physics 70(1), 223–287.
  14. State-dependent temperature control for langevin diffusions. SIAM Journal on Control and Optimization 60(3), 1250–1268.
  15. Jump-diffusion processes as models for neuronal activity. Biosystems 40(1-2), 75–82.
  16. Abrupt transitions in time series with uncertainties. Nature Communications 9(1), 48–57.
  17. Reinforcement learning for linear-convex models with jumps via stability analysis of feedback controls. SIAM Journal on Control and Optimization 61(2), 755–787.
  18. Entropy regularization for mean field games with learning. Mathematics of Operations research 47(4), 3239–3260.
  19. Achieving mean–variance efficiency by continuous-time reinforcement learning. In Proceedings of the Third ACM International Conference on AI in Finance, pp.  377–385.
  20. Limit Theorems for Stochastic Processes. Springer.
  21. Jia, Y. and X. Y. Zhou (2022a). Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research 23(154), 1–55.
  22. Jia, Y. and X. Y. Zhou (2022b). Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research 23(154), 1–55.
  23. Jia, Y. and X. Y. Zhou (2023). q-learning in continuous time. Journal of Machine Learning Research 24(161), 1–61.
  24. Kou, S. G. (2002). A jump-diffusion model for option pricing. Management Science 48(8), 1086–1101.
  25. Kunita, H. (2004). Stochastic differential equations based on Lévy processes and stochastic flows of diffeomorphisms. In M. M. Rao (Ed.), Real and Stochastic Analysis: New Perspectives, pp.  305–373. Birkhäuser.
  26. Kushner, H. J. (2000). Jump-diffusions with controlled jumps: Existence and numerical methods. Journal of Mathematical Analysis and Applications 249(1), 179–198.
  27. Characterizing abrupt transitions in stochastic dynamics. New Journal of Physics 20(11), 113043.
  28. Time-changed Ornstein–Uhlenbeck processes and their applications in commodity derivative models. Mathematical Finance 24(2), 289–330.
  29. Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics 3(1-2), 125–144.
  30. Munos, R. (2006). Policy gradient in continuous time. Journal of Machine Learning Research 7, 771–791.
  31. Applied Stochastic Control of Jump Diffusions, Volume 498. Springer.
  32. Time discretization-invariant safe action repetition for policy gradient methods. Advances in Neural Information Processing Systems 34, 267–279.
  33. Regularity and stability of feedback relaxed controls. SIAM Journal on Control and Optimization 59(5), 3118–3151.
  34. Sato, K.-I. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press.
  35. Situ, R. (2006). Theory of Stochastic Differential Equations with Jumps and Applications. Springer.
  36. Making deep q-learning methods robust to time discretization. arXiv preprint arXiv:1901.09732.
  37. Reinforcement learning for continuous-time optimal execution: actor-critic algorithm and error analysis. Available at SSRN 4378950.
  38. Testing for the presence of jump components in jump diffusion models. Journal of Econometrics 230(2), 483–509.
  39. Reinforcement learning in continuous time and space: A stochastic control approach. Journal of Machine Learning Research 21(198), 1–34.
  40. Wang, H. and X. Y. Zhou (2020). Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance 30(4), 1273–1308.
  41. Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. Journal of Economic Dynamics and Control 158, 104787.
  42. Continuous-time mean-variance portfolio selection: A stochastic LQ framework. Applied Mathematics and Optimization 42(1), 19–33.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xuefeng Gao (28 papers)
  2. Lingfei Li (10 papers)
  3. Xun Yu Zhou (33 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.