Papers
Topics
Authors
Recent
2000 character limit reached

Limits To (Machine) Learning (2512.12735v1)

Published 14 Dec 2025 in stat.ML and cs.LG

Abstract: Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantifying the unavoidable discrepancy between a model's empirical fit and the population benchmark. Recovering the true population $R2$, therefore, requires correcting observed predictive performance by this bound. Using a broad set of variables, including excess returns, yields, credit spreads, and valuation ratios, we find that the implied LLGs are large. This indicates that standard ML approaches can substantially understate true predictability in financial data. We also derive LLG-based refinements to the classic Hansen and Jagannathan (1991) bounds, analyze implications for parameter learning in general-equilibrium settings, and show that the LLG provides a natural mechanism for generating excess volatility.

Summary

  • The paper introduces the Limits-to-Learning Gap (LLG) as a universal lower bound capturing the irreducible estimation error in high-dimensional ML tasks.
  • It provides closed-form LLG corrections to adjust out-of-sample R² estimates, revealing substantial signal in data previously seen as weakly predictable.
  • Empirical and simulation results confirm that LLG distinguishes statistical noise from true signal, informing optimal model design and resource allocation.

Universal Lower Bounds and Statistical Limits in Machine Learning: A Review of "Limits To (Machine) Learning" (2512.12735)

Theoretical Foundation: Limits-to-Learning Gap (LLG)

"Limits To (Machine) Learning" develops a comprehensive theoretical analysis of the fundamental statistical limitations facing ML models when applied to finite, high-dimensional datasets. The paper introduces the Limits-to-Learning Gap (LLG), a universal lower bound on the discrepancy between the out-of-sample generalization performance of an estimator and the infeasible population performance. Crucially, LLG is determined solely by observable sample features and does not require specification or estimation of the data-generating function.

Formally, for prediction tasks yt+1=ft+εt+1y_{t+1} = f_t + \varepsilon_{t+1}, any linear (and many nonlinear) estimator's out-of-sample error is bounded below by the irreducible noise plus an estimation error. The latter term, characterized by the LLG, grows with the statistical complexity (where the number of features PP is not negligible compared to sample size TT), and does not vanish even asymptotically for over-parameterized regimes. This result holds for high-dimensional linear estimators (ridge, kernel ridge regression) and, via NTK theory, for wide deep neural networks.

The closed-form LLG correction allows explicit construction of corrected lower confidence bounds for population R2R^2, revealing situations where observed low ROOS2R^2_{\text{OOS}} metrics may vastly understate true data predictability due to high estimation error. The paper further develops asymptotic normality and pivotal finite-sample lower bounds for this quantity, bringing rigorous statistical inference to high-dimensional predictive setups.

Empirical Analysis: Financial Time Series and Statistical Illusion of Weak Predictability

Empirical application uses the Welch & Goyal (2008) macro-financial dataset, encompassing returns, yields, spreads, and valuation ratios. The key empirical finding is that LLG corrections are quantitatively large in almost all settings studied—even where state-of-the-art ML estimators indicate low or negative ROOS2R^2_{\text{OOS}}. For instance, the LLG for S&P 500 monthly return prediction implies a lower bound for population R2R^2 around 20%, contrasting with empirical ROOS2R^2_{\text{OOS}} of 1–2%. Several processed asset pricing variables exhibit LLG-corrected predictabilities in the 10–70% range, even though standard models fail to detect structure. This provides strong support for the claim that "statistical illusions" regarding weak predictability arise from high estimation error intrinsic to complex, finite-sample regimes.

By evaluating nonlinear, feature-selected models (e.g., recursive ridge and advanced regularized regressors), the paper substantiates that in cases with high LLG, out-of-sample predictability is indeed recoverable by more expressive or better-architected algorithms, and that the LLG predicts the attainable R2R^2 frontier. In contrast, negative or near-zero LLG tightly identifies hopeless predictive settings, distinguishing noise from signal absence.

Implications for Asset Pricing, Equilibrium, and Econometric Theory

The analysis yields concrete theoretical advancements for econometric theory and macro-finance. The paper derives LLG-adjusted versions of the Hansen–Jagannathan (1991) bounds, thereby quantifying how parameter uncertainty relaxes asset pricing restrictions. LLG enters into the calculation of the lower bound on admissible stochastic discount factor (SDF) volatility, resolving some of the "excess volatility" and SDF dispersion puzzles commonly attributed to behavioral explanations.

The framework formalizes how rational equilibrium agents in high-dimensional economies also face intrinsic LLG-induced estimation error, leading to excess price volatility (in the spirit of Shiller) without recourse to irrationality or behavioral preferences. This overturns the equivalence between econometrician and agent information sets often assumed in asset pricing and macro models.

Moreover, the connection to ML scaling laws is theoretically elucidated: the empirically observed sharply diminishing returns to data or compute in large-scale ML is attributed to the slow (subclassical) convergence of the LLG with increasing sample size. This clarifies why additional data or complexity may not substantially reduce the generalization gap in overparameterized or high-dimensional regimes.

Methodological Contributions

The principal methodological contributions are:

  • Derivation of universal lower bounds (LLG) and practical computation for both linear and nonlinear (NTK-based) estimators
  • Provision of confidence intervals and pivotal finite-sample inference for the infeasible R2R^2
  • Application to robust, model-free empirical tests for the existence of signal versus limits of learning
  • Explicit demonstration that LLG depends only on the design and distribution of features, not on the properties or complexity of the unknown predictive function

Strength of Empirical and Numerical Results

The simulation results, using both synthetic and real data with controlled signal-to-noise manipulation, validate the theoretical predictions. The lower bounds generated by LLG corrections closely track the true infeasible R2R^2, and confidence band calibration is empirically supported. Notably, for financial tasks frequently labeled "unpredictable" by standard methods, the LLG correction establishes that substantial signal is in fact present but unlearnable by extant ML paradigms, challenging canonical interpretations in econometrics and empirical finance.

Theoretical and Practical Implications

For Empirical Researchers

Evaluation of predictive studies in economics, finance, or other high-dimensional sciences should incorporate the LLG to distinguish absence of predictability from limits in statistical inference. Model architecture, variable selection, and sample design should be guided by the statistical complexity and LLG magnitude, rather than raw R2R^2 alone.

For Theory and Policy

Policymakers and theorists drawing conclusions from ML-based forecast studies (e.g., of economic volatility, SDF estimates, risk premia) should account for statistical limits imposed by data and feature dimensionality. The LLG provides a formal, model-agnostic approach to calibrating learning limits that is applicable across disciplines.

Future Directions

The demonstrated sharpness and generality of the LLG suggest several avenues for further research:

  • Extension to fully nonparametric function classes and robust estimation settings
  • Dynamic statistical complexity analysis across time-varying macro-financial environments
  • LLG-guided design of optimal experiment/resource allocation strategies in high-dimensional scientific inference
  • Integration into uncertainty quantification for DSGE and other structural econometric models

Conclusion

This work establishes the LLG as a foundational constraint on feasible machine learning, reconciling the apparent failure of ML models in high-dimensional prediction tasks with the statistical limits of finite-sample inference. The resulting framework corrects empirical practice, enriches econometric theory, and provides actionable diagnostics for research design and policy analysis in all settings where model complexity rivals sample size. Incorporating the LLG should become standard in rigorous predictive modeling, particularly in economics and finance, as data dimensionality and model expressiveness further escalate.

Whiteboard

Paper to Video (Beta)

Explain it Like I'm 14

What is this paper about?

This paper asks a simple but important question: when we use ML to predict things in economics and finance—like stock returns—are we judging the results fairly? The authors show that in many real-world cases, ML models look worse than they truly are because there isn’t enough data for such complex models. They provide a tool, called the Limits‑to‑Learning Gap (LLG), to correct for this and reveal how much real predictability might be hiding behind noisy results.

What questions are the authors trying to answer?

  • Why do modern ML models often say “there’s almost no predictability” even when there may be real, strong patterns?
  • Can we measure how much performance is lost just because we don’t have enough examples for a complex model?
  • How big is this “lost performance” in real financial data—like stock returns, bond yields, or credit spreads?
  • What does this mean for big ideas in finance, like how risky assets should be priced or why markets seem more volatile than fundamentals?

How did they study the problem? (In plain terms)

Think of learning as trying to recognize a pattern in a huge puzzle with many pieces (lots of variables) but only a small number of examples (years or months of data). If you have too many knobs to tune and not enough examples, even smart tools get confused. That’s not because the pattern isn’t there—it’s because the “lens” you’re using is blurry when the sample is small.

Here’s what they did:

  • Key idea: the Limits‑to‑Learning Gap (LLG)
    • The LLG measures the minimum “unavoidable gap” between how well your model performs on new data and how well the best possible model could perform with perfect knowledge.
    • Crucially, the LLG is computed from the inputs (the predictors) alone. You don’t need to fit a complicated ML model to compute it.
    • If the LLG is large, even a great model will look worse than it “truly” is if you only judge it by standard out‑of‑sample tests.
  • Why does this gap appear?
    • “High‑dimensionality”: you have many predictors (features) compared to the number of observations. The authors call this high “statistical complexity” (roughly, number of features divided by sample size).
    • In this regime, models inevitably “pick up” noise, which lowers measured performance—even if the real signal is strong.
  • How they check it:
    • They analyze simple models (like ridge regression) that are widely used and also help explain how deep neural networks behave in practice.
    • They show how to compute the LLG for both linear models and (through recent theory) for neural networks.
    • They also prove a statistical result that lets you build a one‑sided confidence interval: a cautious lower bound on the true predictability.
  • Real‑data test bed:
    • They use a well‑known dataset (Welch & Goyal) with U.S. financial variables over many decades: market returns, valuation ratios, bond yields, credit spreads, inflation, and more.
    • They carefully “pre‑process” these series (normalize, de‑trend short-run persistence) so the comparisons are fair.
    • They generate many nonlinear “features,” run ridge models, record usual out‑of‑sample scores, then apply their LLG correction.

Analogy: Imagine judging a camera by the sharpness of photos taken in the dark. Even a great camera will look blurry. LLG is like measuring how dark the room is. It tells you, “Don’t blame the camera for some of this blur—it’s the lighting.”

What did they find?

Big picture: Standard ML tests often understate how predictable financial data really are—sometimes by a lot.

Highlights:

  • For U.S. market monthly returns, usual methods often report out‑of‑sample predictability near 1–2% (or even negative). After applying the LLG correction, the true predictability could be at least about 20%.
  • For many valuation ratios and spreads, the LLG suggests the true predictability of their cleaned‑up changes (AR(1) residuals) can exceed 30%, and for some variables even 50–70%.
  • When the LLG says “there should be signal here,” more flexible models (nonlinear approaches with feature selection) often do find meaningful out‑of‑sample predictability (for example, about 10% for Treasury bill rate changes), lining up with the LLG’s lower bound.
  • The LLG often grows when you add more features relative to your data size—this is the “statistical complexity” effect. More features can help capture real patterns, but they also make finite‑sample learning harder, which makes naive scorecards look worse than reality.
  • Theory links:
    • Asset pricing bounds (Hansen–Jagannathan): accounting for LLG increases the implied variability that the pricing model must explain. In plain terms, if the true predictability is higher than standard tests show, then standard models may need to allow for more “movement” in the discount factor that prices assets.
    • Excess volatility puzzle: the LLG provides a rational, learning‑based reason prices can be more volatile than fundamentals. When agents have to learn in a high‑dimensional world, they can overreact to noisy signals, adding extra price swings—even without assuming irrational behavior.
    • ML scaling laws: the LLG shrinks only slowly as you add more data, explaining why bigger datasets and bigger models don’t always yield fast improvements.

Why this is important:

  • Many researchers and practitioners look only at out‑of‑sample performance and conclude “no predictability.” This paper shows that conclusion can be misleading when models are complex relative to the data available.
  • The LLG gives a practical, data‑driven correction that separates “no signal” from “signal that’s hard to learn with limited data.”

Why does this matter?

  • Better testing: The LLG offers a new, simple diagnostic to judge whether poor ML performance means “no signal” or just “not enough data for this level of complexity.”
  • Better models: If the LLG is large, researchers might switch to smarter architectures, better feature design, or targeted nonlinear models, rather than giving up on predictability.
  • Better finance theory: Adjusting classic asset‑pricing bounds with LLG acknowledges that real‑world learning is hard. This helps explain why markets can look more volatile and why parameter learning is slow, without assuming people are irrational.
  • Smarter scaling: The LLG helps decide when more data or compute will actually help—and when you’re hitting a fundamental limit.

Key terms in simple words

  • Predictability: how much of future ups and downs a model can explain.
  • Out‑of‑sample test: testing your model on new data it didn’t see when it was trained.
  • R2R^2: a score from 0 to 1 (or 0% to 100%) showing how much variation your model explains; higher is better.
  • High‑dimensional: having lots of predictors compared to the number of data points.
  • Limits‑to‑Learning Gap (LLG): a built‑in penalty that tells you the minimum performance gap you’ll see just because you’re trying to learn too much from too little data.
  • Statistical complexity: roughly, number of features divided by sample size; higher means the learning problem is harder.

Takeaway

Don’t take weak out‑of‑sample ML scores at face value in complex, data‑hungry settings. The Limits‑to‑Learning Gap shows that much stronger true predictability can be hiding behind noisy results. By measuring this gap, we can judge models more fairly, design better prediction strategies, and better understand puzzling features of financial markets—like why prices can swing more than fundamentals would suggest.

Knowledge Gaps

Knowledge gaps, limitations, and open questions

The following list summarizes what remains missing, uncertain, or unexplored in the paper, focusing on concrete directions future researchers can act on.

Theoretical assumptions and bounds

  • Formal robustness of the LLG and CLT-based confidence bounds to serial correlation, conditional heteroskedasticity (e.g., GARCH), heavy tails, and dependence between errors and signals beyond the iid, homoskedastic error assumption in Assumption 1.
  • Precise finite-sample properties (bias, variance, coverage accuracy) of the LLG-based lower bound and the pivotal estimator for R2R^2 under realistic time-series dependence and limited out-of-sample windows.
  • Conditions ensuring the interaction term in equation (19) is negligible in dependent, non-stationary settings; develop diagnostics or tests to verify condition (22) in practice.
  • Tightness of the LLG lower bound: identify model classes or signal structures where the bound is tight vs. loose, and characterize the wedge between the bound and feasible R2R^2.
  • Extension of LLG-style bounds to alternative loss functions and tasks (e.g., classification metrics such as AUC, quantile loss), and to portfolio objectives beyond MSE/R2^2.
  • LLG under measurement error in signals and missing data; quantify sensitivity to noisy or imputed covariates.
  • Distribution-shift sensitivity: quantify how differences between in-sample and out-of-sample signal covariance Eoos[SS]E_{\text{oos}}[SS'] affect LLG and the validity of the lower bound when covariances are not stable.
  • Generalization of Random Matrix Theory results (Proposition 8) to more realistic signal dependency structures (e.g., factor models, time-varying covariances, non-iid coordinates).

Nonlinear models and generality

  • Conditions under which deep neural networks genuinely admit the linear-in-yy kernel/ridge representation used to compute LLG: dependence on width, depth, optimizer (e.g., Adam vs. SGD), batch norm, regularization, early stopping, and training regime (“lazy” vs. feature learning).
  • LLG computation for nonlinear models not covered by NTK approximations (e.g., decision trees/boosting, random forests, generalized additive models, transformers); derive analogous K(ST,S)K(S_T^\top, S) or comparable bounds for these classes.
  • Sensitivity of LLG to hyperparameters (ridge penalty zz, kernel choice, random feature distribution, activation functions); provide principled procedures to select or average over hyperparameters when computing bounds.
  • Relationship between LLG and “double descent” phenomena; characterize optimal statistical complexity c=P/Tc=P/T that maximizes proximity of the lower bound to true R2R^2 without inflating variance.
  • Decomposition of LLG across signals/features: develop methods to attribute the limits-to-learning gap to specific subsets of covariates to guide variable selection and model design.

Empirical design and robustness

  • Robustness of conclusions to the preprocessing choices in Procedure 1 (36-month rolling window, [-3,3] clipping, AR(1) residualization): quantify how alternative windows, clipping thresholds, or detrending methods change LLG and lower bounds.
  • Sensitivity of results to the out-of-sample split at January 1990 and to structural breaks; evaluate across multiple rolling/expanding windows and breakpoints.
  • Applicability beyond the Welch–Goyal monthly US dataset: test LLG on other asset classes (FX, commodities), higher/lower frequencies (daily/quarterly), international markets, and cross-sectional prediction problems.
  • Practical computation and stability of the pivotal estimator for R2R^2 (Appendix D): provide simplified, numerically stable implementations, diagnostics, and guidelines for practitioners, including heavy-tail-robust estimation of σε2\sigma_\varepsilon^2.
  • Ex-ante usability: because Eoos[SS]E_{\text{oos}}[SS'] is needed for LLG yet unknown before evaluation, develop procedures to estimate or bound CC ex-ante (e.g., via time-series CV, rolling surrogates, or worst-case bounds) to inform model design before deployment.
  • Model discovery guided by LLG: formalize strategies to choose architectures/feature maps when LLG is large (e.g., kernel mixtures, adaptive nonlinearity selection), and benchmarks for how close practical models can approach the LLG-implied lower bound.
  • Empirical reconciliation of large LLG with persistently low realized R2R^2 in returns: identify and test specific nonlinear functional forms or data augmentations capable of closing the gap for targets where recursive ridge still underperforms the bound.

Asset-pricing implications and equilibrium modeling

  • Empirical quantification of the LLG-adjusted Hansen–Jagannathan bounds: estimate the implied lower bound on SDF variance in real data and compare with macro-based SDFs; design tests to reject models that cannot meet the LLG-implied variance threshold.
  • Operationalizing “dark matter” in SDF construction (Proposition 6): measure the volatility of the objective likelihood ratio LT(RT+1)L_T(R_{T+1}) and its contribution to the SDF under realistic learning models; validate against observed pricing errors and fat tails.
  • Generalization of the excess-volatility mechanism (Section 4.2) to risk-averse agents, multi-period settings, persistent fundamentals, and non-Gaussian priors; calibrate whether LLG-induced volatility magnitudes match Shiller-style excess volatility in data.
  • Joint learning of parameters and state dynamics: integrate LLG into DSGE or habit models where agents learn multiple high-dimensional parameters; quantify how learning frictions propagate to prices and cross-sectional moments.
  • Policy and welfare implications: analyze how limits-to-learning affect asset-price informativeness, market efficiency, and the value of information; assess whether reducing statistical complexity (e.g., via disclosure or standardization) can mitigate excess volatility.
  • Scaling-law validation: empirically link measured LLG across datasets/models to observed compute/data scaling exponents; determine when more data materially reduces the limits-to-learning gap and when returns to scale are fundamentally bounded.

Glossary

Below is an alphabetical list of advanced domain-specific terms from the paper, each with a short definition and a verbatim example of how it is used in the text.

  • Activation function: A nonlinear transformation applied to features in neural networks or random-feature models. "The non- linearity g(x) is commonly referred to as the activation function."
  • AR(1) residuals: Residuals obtained after removing first-order autoregressive dependence from a time series. "LLG corrections indicate that the AR(1) residuals of many valuation ratios and spreads are more than 30% predictable"
  • Bayes-optimal: The estimator that minimizes expected loss under a specified prior and data model. "then the linear ridge regression estimator with 2 : 02 P OST is Bayes-optimal; that is, no other ML model (linear or nonlinear) can achieve better performance than ridge."
  • Cauchy-Schwarz inequality: A fundamental inequality in inner-product spaces used to bound correlations and variances. "and the Cauchy-Schwarz inequality implies that"
  • Central Limit Theorem (CLT): A statistical result ensuring normalized sums of random variables converge to a normal distribution. "We also derive a Central Limit Theorem (CLT), allowing us to construct a one-sided confi- dence interval for R2."
  • Curse of dimensionality: The phenomenon where estimation and learning become difficult as the number of parameters/features grows large relative to samples. "The LLG emerges due to the curse of dimensionality."
  • Dark matter (asset pricing): Unobservable or implicit components that models rely on to explain pricing behavior. "Our findings also relate to the notion of 'dark matter' in asset pricing introduced by Chen et al. (2024a)"
  • GARCH(1,1): A time-series model for volatility where current variance depends on past squared shocks and past variance. "yt+1 follows a GARCH(1,1) process."
  • Gaussian kernel: A kernel function that measures similarity via an exponential of squared Euclidean distance. "One popular choice is the Gaussian kernel, k(x1,x2) =e-|x1-22||2."
  • Gaussian prior: A Bayesian prior distribution assuming parameters are normally distributed. "We assume that agents do not know the true value of 3 and have a rational, Gaussian prior about it, 3 ~ N(0, IPxP)."
  • Hansen and Jagannathan (1991) bound: A lower bound on the variance of the stochastic discount factor given asset returns. "This is commonly known as the Hansen and Jagannathan (1991) bound."
  • Herfindahl index: A concentration measure (sum of squared shares), here applied to eigenvalues to summarize dispersion. "Thus, L is the Herfindahl index of the eigenvalues of the matrix (zcI + SS' )-1 € RTXT."
  • Heteroskedasticity: Non-constant variance of errors over time or across observations. "allows for any form of non-stationarity, autocorrelation, heteroskedasticity, or model misspecification."
  • Homoskedastic: Having constant error variance across observations. "By construction, these data have zero autocorrelation and are approximately homoskedastic."
  • Intertemporal Marginal Rate of Substitution (IMRS): The ratio that captures how agents trade off consumption across time, used to construct SDFs. "Most equilibrium asset pricing models imply that a stochastic discount factor (SDF) can be constructed from the Intertemporal Marginal Rate of Substitution (IMRS) of economic"
  • Kernel ridge regression: A regularized regression performed in a reproducing kernel Hilbert space via a positive definite kernel. "Another, closely related linear estimator is a kernel ridge regression:"
  • Lazy training regime: A regime where neural networks behave approximately linearly due to small parameter updates during training. "Hastie et al. (2019) explain how it approximates gradient descent in the 'lazy training' regime"
  • Limits-to-Learning Gap (LLG): A data-driven lower bound quantifying the unavoidable gap between empirical and population fit due to finite samples. "We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantify- ing the unavoidable discrepancy between a model's empirical fit and the population benchmark."
  • Neural Tangent Kernel (NTK): A kernel that characterizes the function space of wide neural networks trained by gradient descent. "the so-called Neural Tangent Kernel (NTK)."
  • Objective likelihood: The true likelihood under the objective (data-generating) distribution, distinct from subjective beliefs. "this unobservable objective likelihood constitutes a form of 'dark matter'"
  • Out-of-sample (OOS) Mean Squared Error (MSE): Prediction error evaluated on data not used in training. "out-of-sample (OOS) Mean Squared Error (MSE):"
  • Over-parametrized regime: A setting where the number of parameters exceeds the number of observations. "By contrast, in the over-parametrized regime when P > T, £ can get very large"
  • Pivotal estimator: An estimator whose sampling distribution does not depend on nuisance parameters, enabling valid inference. "There exists an asymptotically consistent, pivotal estimator R2 = ÔR2(y, S) of ØR2 presented in the Appendix."
  • Posterior distribution: The probability distribution of parameters or outcomes conditional on observed data in Bayesian analysis. "and vT(R)dR the agent's posterior distribution given the observations t _ T."
  • Pseudo-true parameter: The parameter value to which estimators converge under model misspecification. "In the presence of model misspecification, estimators converge to a pseudo-true parameter; see White (1996)."
  • Random Matrix Theory (RMT): A field studying properties of matrices with random entries, used here for asymptotic spectral analysis. "However, Random Matrix Theory (RMT) can be used to compute it under more stringent conditions on St."
  • Ridge-regularized least-squares estimator: A linear regression estimator with L2 penalty to control variance and overfitting. "ridge-regularized least-squares estimator,"
  • Sharpe Ratio: A measure of risk-adjusted return defined as mean excess return divided by standard deviation of returns. "with the conditional squared Sharpe Ratio"
  • Stochastic discount factor (SDF): A pricing kernel that, when multiplied by returns, has zero expectation, enforcing no-arbitrage. "Most equilibrium asset pricing models imply that a stochastic discount factor (SDF) can be constructed"
  • Stochastic volatility: Time-varying volatility driven by a stochastic process. "These simulations indicate that Theorems 2 and 3 continue to hold as long as the above-listed transformations are applied to the underlying predicted variable."
  • Super-consistent: Converges to the true value at a rate faster than the standard root-T rate. "We show that our estimator is super-consistent"
  • Virtue of Complexity (VoC) curve: The curve showing how model performance varies with statistical complexity (e.g., P/T). "we refer to the curve showing model performance as a virtue of complexity (VoC) curve."

Practical Applications

Immediate Applications

The following applications can be deployed now, using the paper’s LLG methodology, its CLT-based confidence intervals, and the released implementation code.

  • Finance: Predictability diagnostics and model triage
    • Use LLG to reassess low or negative out-of-sample performance for returns, valuation ratios, yields, and spreads, identifying targets where true population predictability is likely high despite weak empirical fit.
    • Deploy a “LLG dashboard” that ranks targets by lower-bound R² and statistical complexity c, guiding variable selection and model architecture (e.g., nonlinear models or recursive ridge) toward high-LLG series.
    • Sectors/tools/workflows: Asset management, quant research; Python/R library to compute tr(K'K), Roos corrections, VoC (Virtue-of-Complexity) curves; plug-ins to scikit-learn or PyTorch pipelines; codebase at https://github.com/czm319319/CKM_LLG.
    • Assumptions/dependencies: Linear-in-y estimators; adequate out-of-sample window; noise independence/homoskedasticity (or preprocessing with rolling normalization and AR(1) residualization); proper penalty scaling z; stable train/test split.
  • Finance: LLG-adjusted Sharpe and Hansen–Jagannathan stress tests
    • Use the LLG-corrected lower bound for R² to infer minimum conditional Sharpe and update SDF variance requirements (HJ bounds) for factor models and macro-driven SDFs.
    • Build an “HJ–LLG stress-testing module” that flags when models cannot satisfy LLG-implied SDF volatility bounds.
    • Sectors/tools/workflows: Risk management, model validation; analytics that map bound on R² to bounds on conditional Sharpe and SDF variance.
    • Assumptions/dependencies: Correct mapping from LLG-adjusted R² to Sharpe/SDF bounds; stationarity/ergodicity for E[f²] estimates; transparency about whether agents “know ft” versus learning.
  • Trading and portfolio construction: Complexity-aware model development
    • When Roos is poor, use LLG to justify building more expressive models (e.g., nonlinear kernels, NTK-informed nets, recursive ridge) for high-LLG targets (e.g., ep, dfy, tbl, de in the paper).
    • Adopt a “complexity sweep” workflow that varies P/T and z to find working regions where the LLG lower bound approaches feasible R², then focus resources there.
    • Sectors/tools/workflows: Quant strategy R&D; automated grid-search over feature complexity; selective training gates triggered by LLG thresholds.
    • Assumptions/dependencies: Reliable signal generation (e.g., random features); robust validation to avoid overfitting; transaction costs and capacity constraints in strategy deployment.
  • Software/MLOps: LLG as a governance and reporting metric
    • Integrate LLG-corrected R² and its one-sided confidence interval as a required model card field, preventing misinterpretation of out-of-sample metrics in high-dimensional systems.
    • Sectors/tools/workflows: Model governance; CI/CD pipelines for ML; auto-reporting of c, Roos, LLG, and CI in experiment tracking (e.g., MLflow, Weights & Biases).
    • Assumptions/dependencies: Ability to compute K (for linear estimators) or use NTK approximations; adequate Toos and T; reproducible preprocessing.
  • Data/compute planning: Scaling ROI assessment
    • Use LLG to forecast diminishing returns from additional data or computation, informing whether to acquire more samples, features, or GPUs.
    • Build VoC curves and “LLG vs T” plots to quantify expected improvement from scaling.
    • Sectors/tools/workflows: Enterprise ML strategy; budget allocation for data and compute; sustainability planning.
    • Assumptions/dependencies: Stable data-generating process across scaling; accurate complexity measurement; applicability of LLG in the chosen estimator.
  • Academia: Re-evaluation of “no predictability” claims
    • Apply LLG lower bounds and CLT-based one-sided confidence intervals to revisit published findings with low Roos, distinguishing “no signal” from “limits-to-learning.”
    • Sectors/tools/workflows: Empirical finance/econometrics; replication packages using the authors’ code; inclusion of LLG diagnostics in pre-registrations and peer review.
    • Assumptions/dependencies: Adherence to the paper’s preprocessing (rolling normalization, AR(1) residualization) and linear-in-y estimators; sufficient out-of-sample horizon.
  • Non-finance regression tasks with high-dimensional features (industry/academia)
    • In domains like healthcare (risk scores), energy (demand forecasting), and operations (time-series KPIs), use LLG to identify when weak Roos likely understates true predictability, guiding model and feature complexity choices.
    • Sectors/tools/workflows: Healthcare analytics, energy forecasting, industrial IoT; LLG computations for kernel ridge or linear-in-y predictors; VoC curves to set feature budgets.
    • Assumptions/dependencies: Continuous outcomes (regression); noise independence/homoskedasticity or careful preprocessing; linear-in-y estimators or NTK-approximated wide nets; sector-specific validation.

Long-Term Applications

The following applications require further research, scaling, integration, or policy development before broad deployment.

  • Regulatory standards for ML backtesting and model risk (policy/finance)
    • Incorporate LLG-adjusted R² and one-sided confidence intervals into backtesting rules for trading models and stress tests, reducing false “no predictability” conclusions.
    • Sectors/tools/workflows: Securities regulators, central banks; supervisory templates that mandate reporting LLG diagnostics in submissions.
    • Assumptions/dependencies: Consensus on LLG methodology; sector-specific guidance on preprocessing and estimator choices; regulatory adoption and standardization.
  • Asset-pricing theory and macro models with learning-induced excess volatility
    • Build general-equilibrium models where LLG (learning frictions in high-dimensional environments) generates excess price volatility without behavioral assumptions.
    • Quantify “dark matter” components (objective likelihood volatility) implied by LLG and test belief formation models.
    • Sectors/tools/workflows: Academic macro-finance; calibration toolkits linking LLG to SDF volatility; structural estimation frameworks.
    • Assumptions/dependencies: Microfoundations for learning frictions; observable proxies for objective likelihood; robust identification in long samples with structural breaks.
  • NTK-based LLG computation for deep nets at scale (software/AI)
    • Standardize LLG modules that compute K via NTK for wide neural networks trained by gradient descent, enabling model-agnostic diagnostics across deep learning pipelines.
    • Sectors/tools/workflows: DL frameworks (PyTorch/JAX/TF); NTK libraries; automated LLG computation in hyperparameter optimization.
    • Assumptions/dependencies: Approximation accuracy of NTK to network behavior; computational efficiency for large P/T; broader coverage beyond regression.
  • Institutional investment products driven by LLG insights (finance)
    • Launch strategies that focus on series with high LLG lower bounds (e.g., de, dfy, ep, tbl), employing specialized nonlinear models to harvest latent predictability.
    • Sectors/tools/workflows: Funds, SMAs; LLG-informed signals; complexity-aware research processes; enhanced governance and disclosures.
    • Assumptions/dependencies: Persistent signals post-transaction costs; risk controls; avoiding data snooping; capacity limits.
  • Central bank and market microstructure policy (policy/finance)
    • Use LLG to interpret the apparent absence of predictability in prices or macro indicators and design information policies (e.g., data releases) mindful of learning frictions.
    • Sectors/tools/workflows: Policy analysis; dashboards linking LLG to predictability in macro/financial series.
    • Assumptions/dependencies: Robust mapping from LLG to welfare/efficiency metrics; coordination with statistical agencies and market infrastructure.
  • Cross-sector expansion beyond regression (software/industry)
    • Extend the theory and tooling to classification and survival analysis, enabling LLG-like diagnostics for outcomes beyond continuous y.
    • Sectors/tools/workflows: Healthcare (survival), ad tech (classification), cybersecurity (event detection); theoretical extensions and practical libraries.
    • Assumptions/dependencies: New theoretical work for non-regression losses; calibration of bounds and confidence intervals; careful treatment of label noise.
  • Enterprise data strategy and education (industry/academia)
    • Institutionalize LLG-guided feature engineering, complexity budgeting, and curriculum modules on limits-to-learning for data science teams and graduate programs.
    • Sectors/tools/workflows: Internal standards; training materials; course modules on high-dimensional inference and LLG.
    • Assumptions/dependencies: Organizational buy-in; integration with existing MLOps; pedagogical alignment with econometrics/ML programs.
  • Sustainability and compute planning (policy/industry)
    • Combine LLG scaling analysis with carbon accounting to decide when extra data/compute is environmentally justified given slow closing of the learning gap.
    • Sectors/tools/workflows: ESG planning; compute emissions calculators aligned with LLG-based scaling curves.
    • Assumptions/dependencies: Accurate emissions data; stable extrapolation of LLG vs T/P; governance alignment with sustainability goals.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 137 likes about this paper.