Deep Empirical Risk Minimization in finance: looking into the future (2011.09349v3)

Published 18 Nov 2020 in stat.ML, cs.LG, and math.OC

Abstract: Many modern computational approaches to classical problems in quantitative finance are formulated as empirical loss minimization (ERM), allowing direct applications of classical results from statistical machine learning. These methods, designed to directly construct the optimal feedback representation of hedging or investment decisions, are analyzed in this framework demonstrating their effectiveness as well as their susceptibility to generalization error. Use of classical techniques shows that over-training renders trained investment decisions to become anticipative, and proves overlearning for large hypothesis spaces. On the other hand, non-asymptotic estimates based on Rademacher complexity show the convergence for sufficiently large training sets. These results emphasize the importance of synthetic data generation and the appropriate calibration of complex models to market data. A numerically studied stylized example illustrates these possibilities, including the importance of problem dimension in the degree of overlearning, and the effectiveness of this approach.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces Dynamic Deep ERM, applying deep learning to solve financial stochastic optimal control problems but identifying a critical "overlearning" issue where models exploit future information in training data.
It theoretically analyzes overlearning using statistical learning theory, showing generalization error is related to hypothesis space complexity and data size, implying more complex models need more data.
Numerical experiments on portfolio management confirm overlearning, especially in high dimensions, and demonstrate that significantly increasing training data is essential to reduce the in-sample/out-of-sample gap and improve accuracy.

This paper investigates the application of deep learning via Empirical Risk Minimization (ERM) to solve stochastic optimal control problems, specifically focusing on financial applications like dynamic hedging and portfolio management. The core idea, termed dynamic deep ERM, is to parameterize feedback control policies using deep neural networks and minimize an empirical average of the pathwise cost over a training dataset.

The paper's key contribution lies in providing both a computational and theoretical assessment of this approach, with a particular emphasis on the crucial aspect of generalization error and a phenomenon the authors call "overlearning."

The methodology reformulates sequential decision problems under uncertainty into an ERM framework. For a given stochastic process $Z$ driving the system dynamics (e.g., stock returns), a feedback control policy $a$ (a function mapping time, state, and current randomness to an action) determines a state trajectory and corresponding actions. A pathwise cost function $\ell(a, Z)$ is defined, and the goal is to minimize its expected value, $v(a) = E[\ell(a, Z)]$ , over a space of admissible feedback policies. The dynamic deep ERM approach uses a finite training set of $Z$ trajectories, $\{Z^{(i)}\}_{i=1}^n$ , to define an empirical loss function, $L(a; \{Z^{(i)}\}) = \frac{1}{n} \sum_{i=1}^n \ell(a, Z^{(i)})$ . This empirical loss is then minimized over a hypothesis space of policies $\mathcal{H}_k$ , typically given by a neural network architecture parameterized by $\theta$ , $a(t, x, z; \theta)$ .

A central finding is that sufficiently rich hypothesis spaces (e.g., large neural networks) are susceptible to "overlearning." The paper proves (Theorem 3.1) that as the complexity of the hypothesis space increases (e.g., network size grows), the empirical minimum achievable on the training data approaches the minimum achievable by anticipative controls. Anticipative controls are policies that can observe the entire future trajectory of $Z$ at any given time step, circumventing the fundamental adaptedness requirement that actions only depend on information available up to the current time. While the trained neural network policies are technically feedback policies (and thus adapted in principle), on the specific training data, they can effectively "look into the future" by memorizing future randomness associated with the training trajectories. This leads to an in-sample performance that is artificially better than the true optimal adapted policy, as it exploits non-adapted information present in the training data.

The paper employs tools from statistical machine learning, specifically Rademacher complexity, to analyze the generalization error. It provides bounds (Theorem 5.1, Equation 5.3) showing that the difference between the expected performance $v(a)$ and the empirical performance $L(a; \{Z^{(i)}\})$ for any policy $a$ in a hypothesis space $\mathcal{H}_k$ is related to the Rademacher complexity of $\mathcal{H}_k$ and the training set size $n$ . This quantifies the bias-complexity trade-off: more complex networks can approximate better policies but have higher complexity, potentially leading to larger generalization gaps. Convergence results (Corollary 5.2) show that if the Rademacher complexity of the hypothesis space grows slower than the training set size, the performance of the trained network converges to the optimal performance achievable within the hypothesis space.

Practical implementation details discussed include:

Data Requirement: The method is data-hungry, requiring large training sets to mitigate overlearning and ensure generalization.
Data Generation: For problems where real-world data is limited, simulation from an assumed or calibrated model is crucial. The ability to simulate data consistent with market dynamics is highlighted as key.
Optimization: Standard stochastic gradient descent variants (like Adam) are used for training.
Regularization: Techniques like early stopping (based on validation set performance) are essential practical regularizers to combat overlearning. The paper demonstrates that aggressive optimization without early stopping leads to significant overlearning.
Dimensionality: Numerical experiments show that overlearning becomes more pronounced in higher dimensions, necessitating even larger training sets to achieve good generalization.

Numerical experiments on a stylized Merton utility maximization problem illustrate these points:

The dynamic deep ERM approach is effective in finding near-optimal policies.
Overlearning is observed, with in-sample performance significantly exceeding out-of-sample performance, particularly in higher dimensions (e.g., 100 dimensions showing much larger gaps than 10 dimensions).
This dimensional dependence of overlearning persists even when controlling for the total number of network parameters.
Crucially, increasing the training data size (from 100,000 to 2,000,000 trajectories) dramatically reduces the in-sample/out-of-sample gap and improves overall accuracy, supporting the theoretical convergence results and emphasizing the importance of data scale.

The paper also briefly touches upon how problems like controlled Markov chains can be reformulated to fit the required structure where the driving randomness is independent of the control action (Appendix B), often by augmenting the state space and using Radon-Nikodym derivatives.

In conclusion, the paper establishes dynamic deep ERM as a flexible and powerful method for high-dimensional financial control problems but underscores its susceptibility to overlearning due to the potential exploitation of non-adapted information in finite training data. It provides theoretical grounding for this phenomenon using statistical learning theory and demonstrates numerically that sufficient training data is the primary antidote to overlearning, particularly as problem dimensionality increases.

PDF Markdown

Deep Empirical Risk Minimization in finance: looking into the future (2011.09349v3)

Summary

Related Papers