Compositional Drift Functions
- Compositional drift functions are defined as nested compositions of Hölder-smooth functions, enabling effective modeling of multivariate diffusion drifts.
- Neural network estimators employing deep ReLU architecture achieve non-asymptotic risk guarantees with convergence rates independent of the ambient dimension.
- Empirical results confirm that such neural approaches scale linearly with dimension, outperforming spline-based methods in high-dimensional settings.
Compositional Drift Functions arise in the context of nonparametric estimation of drift in multivariate, time-homogeneous diffusion processes, where the drift function possesses an intrinsic layered composition structure. These functions represent a key class of statistical targets where modern neural-network-based estimators attain dimension-robust convergence rates, thereby circumventing the curse of dimensionality even in high-dimensional problems. Rigorous theoretical and empirical results in (Zhao et al., 14 Nov 2025) establish that for compositional drift functions, one can construct neural network estimators with non-asymptotic risk guarantees and explicit convergence rates depending only on the intrinsic compositional structure, rather than on the ambient space dimension.
1. Definition and Mathematical Characterization
A compositional drift function refers to a drift vector field where, for each component, the restriction to a compact domain can be expressed as a nested composition of Hölder-smooth functions, each depending on only a small subset of variables. Formally, the target function belongs to a compositional Hölder class of depth , where the composition can be written as: with each , , , and -Hölder smooth. The parameters describe the compositional depth, width, and constituent smoothness. This function structure generalizes classical multi-index and additive models by accommodating arbitrary sub-compositions and variable reuse.
2. Drift Estimation for Diffusions: Setup
The canonical estimation task involves the observed solution to the stochastic differential equation: where is -dimensional Brownian motion, and are globally Lipschitz, and is arbitrary. The estimation focuses on the nonparametric recovery of for each component , using independent, high-frequency (i.e., mesh size ) discrete-time sample paths over a fixed time horizon , with no ergodicity assumption.
3. Neural Network Estimators and Risk Decomposition
The state-of-the-art estimator constructs a drift estimate as a clipped, high-sparsity, deep ReLU neural network: where minimizes the empirical squared-increment loss
with being the normalized finite difference at time along the -th trajectory.
The prediction risk is decomposed non-asymptotically (Theorem 3.1) as: where:
- is the training/optimization error,
- is the neural network approximation error,
- the final term quantifies stochastic and diffusion-discretization errors.
4. Explicit Dimension-Independent Rates for Compositional Drift
When the drift possesses a compositional structure, the approximation power of sparse deep networks induces a dimension-robust convergence rate. In particular, for , choosing network depth , sparsity , and defining
yields (Corollary 3.2)
with the rate not depending on the ambient dimension but only the layer widths and composition order , which can remain constant or modest even as .
5. Empirical Validation and Comparison to Spline Methods
Numerical experiments in (Zhao et al., 14 Nov 2025) implement compositional drift using
where is an oscillatory function. Simulating paths in and using neural networks of various depths and sparsity ratios, empirical mean-squared error decays at rate for . Crucially, this rate remains dimension-independent, validating the compositional rate theory.
In direct comparison:
- B-spline ridge estimators exhibit exponential growth in memory and computational cost with (basis size ),
- Neural network estimators scale linearly with in parameter count (), are trainable in minibatches, and more accurately capture sharp local features and oscillatory components of .
6. Limitations, Open Questions, and Extensions
While compositional drift function estimation via neural networks achieves strong theoretical and empirical results:
- The non-asymptotic risk bound is explicit but contains a training error term that depends on potentially suboptimal optimization; global minimization is not guaranteed.
- Joint estimation of diffusion or adaptation to unknown time-grids, on-line or missing data, and non-homogeneous remain open.
- The extension of risk lower bounds and construction of confidence bands for general network classes in the high-frequency regime is unsolved.
The approach does not require ergodicity or an infinite time regime, facilitates sharp risk decomposition, and is robust to the complexity of local or oscillatory features—especially when the underlying drift admits hidden compositional structure.
7. Impact and Broader Perspectives
The compositional drift framework provides a concrete example where deep networks achieve minimax-optimality in nonparametric inference by exploiting intrinsic structural assumptions rather than extrinsic dimension. In high-dimensional stochastic dynamical models, this enables practitioners to construct statistically and computationally efficient estimators for drift fields exhibiting layered or modular dependencies. As applications in stochastic control, mathematical finance, and molecular dynamics often involve multiscale or hierarchical drift mechanisms, these results inform the design of scalable learning-based drift estimators for practical systems far beyond the reach of traditional kernel or basis-expansion approaches.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free