Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration (2312.11797v1)
Abstract: We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by the past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration. However, contrary to the existing results, the optimal Gaussian policy turns out to be biased in general, due to the interwinding needs for hedging and for exploration. We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn Merton's optimal strategies. At last, we carry out both simulation and empirical studies with a stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison to the conventional plug-in method.
- Bergman YZ (1985) Time preference and capital asset pricing models. Journal of Financial Economics 14(1):145–159.
- Chacko G, Viceira LM (2005) Dynamic consumption and portfolio choice with stochastic volatility in incomplete markets. The Review of Financial Studies 18(4):1369–1402.
- Drimus GG (2012) Options on realized variance by transform methods: A non-affine stochastic volatility model. Quantitative Finance 12(11):1679–1694.
- Duffie D, Epstein LG (1992) Stochastic differential utility. Econometrica 353–394.
- Geman S, Hwang CR (1986) Diffusions for global optimization. SIAM Journal on Control and Optimization 24(5):1031–1043.
- Han J, E W (2016) Deep learning approximation for stochastic control problems. arXiv preprint arXiv:1611.07422 .
- Jia Y, Zhou XY (2022a) Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research 23(154):1–55.
- Jia Y, Zhou XY (2022b) Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. Journal of Machine Learning Research 23(154):1–55.
- Jia Y, Zhou XY (2023) q-Learning in continuous time. Journal of Machine Learning Research 24:1–61.
- Kraft H (2005) Optimal portfolios and Heston’s stochastic volatility model: An explicit solution for power utility. Quantitative Finance 5(3):303–313.
- Kydland FE, Prescott EC (1982) Time to build and aggregate fluctuations. Econometrica 1345–1370.
- Liu J (2007) Portfolio selection in stochastic environments. The Review of Financial Studies 20(1):1–39.
- Luenberger DG (1998) Investment Science (Oxford University Press: New York).
- Markowitz H (1952) Portfolio selection. The Journal of Finance 7(1):77–91.
- Merton RC (1969) Lifetime portfolio selection under uncertainty: The continuous-time case. The Review of Economics and Statistics 247–257.
- Merton RC (1980) On estimating the expected return on the market: An exploratory investigation. Journal of Financial Economics 8(4):323–361.
- Wachter JA (2002) Portfolio and consumption decisions under mean-reverting returns: An exact solution for complete markets. Journal of Financial and Quantitative Analysis 37(1):63–91.
- Wang H, Zhou XY (2020) Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance 30(4):1273–1308.
- Min Dai (23 papers)
- Yuchao Dong (15 papers)
- Yanwei Jia (10 papers)
- Xun Yu Zhou (33 papers)