Generative Machine Learning for Multivariate Equity Returns (2311.14735v1)
Abstract: The use of machine learning to generate synthetic data has grown in popularity with the proliferation of text-to-image models and especially LLMs. The core methodology these models use is to learn the distribution of the underlying data, similar to the classical methods common in finance of fitting statistical models to data. In this work, we explore the efficacy of using modern machine learning methods, specifically conditional importance weighted autoencoders (a variant of variational autoencoders) and conditional normalizing flows, for the task of modeling the returns of equities. The main problem we work to address is modeling the joint distribution of all the members of the S&P 500, or, in other words, learning a 500-dimensional joint distribution. We show that this generative model has a broad range of applications in finance, including generating realistic synthetic data, volatility and correlation estimation, risk analysis (e.g., value at risk, or VaR, of portfolios), and portfolio optimization.
- Invertible Residual Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 97), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). PMLR, Long Beach, California, USA, 573–582.
- A Data-driven Market Simulator for Small Data Environments. Papers 2006.14498. arXiv.org.
- Importance Weighted Autoencoders. In 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
- NICE: Non-linear Independent Components Estimation. arXiv e-prints, Article arXiv:1410.8516 (Oct. 2014), arXiv:1410.8516 pages. arXiv:1410.8516 [cs.LG]
- Density estimation using Real NVP. International Conference on Learning Representations (2017).
- Gene D’Avolio. 2002. The market for borrowing stock. Journal of Financial Economics 66, 2 (2002), 271–306. https://doi.org/10.1016/S0304-405X(02)00206-4 Limits on Arbitrage.
- Robert Engle. 2002. Dynamic Conditional Correlation. Journal of Business & Economic Statistics 20, 3 (2002), 339–350. https://doi.org/10.1198/073500102288618487
- Robert F Engle. 1982. Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica 50, 4 (July 1982), 987–1007.
- Eugene F.Fama and Kenneth R.French. 2015. A five-factor asset pricing model. Journal of Financial Economics 116 (2015), 1–22.
- M. B. Garman and M. J. Klass. 1980. On the estimation of security price volatilities from historical data. Journal of Business 53 (1980), 67–78.
- On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks. Journal of Finance 48, 5 (1993), 1779–1801.
- Achintya Gopal. 2020. ESG Imputation Using DLVMs. https://www.bloomberg.com/professional/blog/imputation-of-missing-esg-data-using-deep-latent-variable-models/.
- Achintya Gopal. 2020. Quasi-Autoregressive Residual (QuAR) Flows. arXiv e-prints, Article arXiv:2009.07419 (Sept. 2020), arXiv:2009.07419 pages. arXiv:2009.07419 [cs.LG]
- Harold Hotelling. 1953. New Light on the Correlation Coefficient and its Transforms. Journal of the Royal Statistical Society. Series B (Methodological) 15, 2 (1953), 193–232. http://www.jstor.org/stable/2983768
- Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In International Conference on Machine Learning.
- Diederik P. Kingma and Max Welling. 2014. Auto-Encoding Variational Bayes. In 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Yoshua Bengio and Yann LeCun (Eds.).
- Accurate Uncertainties for Deep Learning Using Calibrated Regression. In Proceedings of the 35th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 80), Jennifer Dy and Andreas Krause (Eds.). PMLR, Stockholmsmässan, Stockholm Sweden, 2796–2804.
- Olivier Ledoit and Michael Wolf. 2020. The Power of (Non-)Linear Shrinking: A Review and Guide to Covariance Matrix Estimation. Journal of Financial Econometrics 20, 1 (06 2020), 187–218. https://doi.org/10.1093/jjfinec/nbaa007
- Harry Markowitz. 1952. Portfolio Selection. Journal of Finance 7, 1 (March 1952), 77–91. https://doi.org/j.1540-6261.1952.tb01525.
- Pixel Recurrent Neural Networks. In Proceedings of The 33rd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New York, New York, USA, 1747–1756.
- Normalizing Flows for Probabilistic Modeling and Inference. Journal of Machine Learning Research 22, 57 (2021), 1–64.
- Language models are unsupervised multitask learners. OpenAI Blog 1, 8 (2019), 9.
- Danilo Rezende and Shakir Mohamed. 2015. Variational Inference with Normalizing Flows. In Proceedings of the 32nd International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 1530–1538.
- Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 3362–3373.
- Attention is All you Need. In Advances in Neural Information Processing Systems, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc., 5998–6008.
- Quant GANs: Deep Generation of Financial Time Series. Papers 1907.06673. arXiv.org.
- Multi-Asset Spot and Option Market Simulation. Papers 2112.06823. arXiv.org.
- Learning Likelihoods with Conditional Normalizing Flows. CoRR abs/1912.00042 (2019). arXiv:1912.00042