Compositional Drift Functions

Updated 17 November 2025

Compositional drift functions are defined as nested compositions of Hölder-smooth functions, enabling effective modeling of multivariate diffusion drifts.
Neural network estimators employing deep ReLU architecture achieve non-asymptotic risk guarantees with convergence rates independent of the ambient dimension.
Empirical results confirm that such neural approaches scale linearly with dimension, outperforming spline-based methods in high-dimensional settings.

Compositional Drift Functions arise in the context of nonparametric estimation of drift in multivariate, time-homogeneous diffusion processes, where the drift function possesses an intrinsic layered composition structure. These functions represent a key class of statistical targets where modern neural-network-based estimators attain dimension-robust convergence rates, thereby circumventing the curse of dimensionality even in high-dimensional problems. Rigorous theoretical and empirical results in (Zhao et al., 14 Nov 2025) establish that for compositional drift functions, one can construct neural network estimators with non-asymptotic risk guarantees and explicit convergence rates depending only on the intrinsic compositional structure, rather than on the ambient space dimension.

1. Definition and Mathematical Characterization

A compositional drift function refers to a drift vector field $b: \mathbb{R}^d \to \mathbb{R}^d$ where, for each component, the restriction to a compact domain $K = [0,1]^d$ can be expressed as a nested composition of Hölder-smooth functions, each depending on only a small subset of variables. Formally, the target function $f_0$ belongs to a compositional Hölder class $\mathcal{G}(q, d, t, \beta, K)$ of depth $q$ , where the composition can be written as: $f_0(x) = h_0 \circ h_1 \circ \cdots \circ h_q(x),$ with each $h_i: \mathbb{R}^{t_i} \to \mathbb{R}^{t_{i-1}}$ , $t_0 = 1$ , $t_q \le d$ , and $h_i$ $\beta_i$ -Hölder smooth. The parameters $(q, t, \beta)$ describe the compositional depth, width, and constituent smoothness. This function structure generalizes classical multi-index and additive models by accommodating arbitrary sub-compositions and variable reuse.

2. Drift Estimation for Diffusions: Setup

The canonical estimation task involves the observed solution $(X_t)_{t\in[0,T]}$ to the stochastic differential equation: $dX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t,$ where $B_t$ is $d$ -dimensional Brownian motion, $b$ and $\sigma$ are globally Lipschitz, and $X_0 \sim \mathcal{L}$ is arbitrary. The estimation focuses on the nonparametric recovery of $f_0(x) = b^i(x)\,\mathbf{1}_{[0,1]^d}(x)$ for each component $i$ , using $N$ independent, high-frequency (i.e., mesh size $\Delta = T/M \to 0$ ) discrete-time sample paths over a fixed time horizon $T$ , with no ergodicity assumption.

3. Neural Network Estimators and Risk Decomposition

The state-of-the-art estimator constructs a drift estimate as a clipped, high-sparsity, deep ReLU neural network: $\widehat{b}^i_{\mathrm{NN}}(x) = \widehat{f}(x)\,\mathbf{1}_{[0,1]^d}(x),$ where $\widehat{f}$ minimizes the empirical squared-increment loss

$Q_{D_N}(f) = \frac{1}{N M} \sum_{n=1}^N \sum_{m=0}^{M-1} \left( Y^{(n)}_{t_m} - f(\overline{X}^{(n)}_{t_m}) \right)^2,$

with $Y^{(n)}_{t_m}$ being the normalized finite difference at time $t_m$ along the $n$ -th trajectory.

The prediction risk is decomposed non-asymptotically (Theorem 3.1) as: $\mathcal{R}(\hat f, f_0) \leq 4\,\Psi^F(\hat f) + 6 \inf_{f\in\mathcal{F}} \mathcal{R}(f, f_0) + C F^2\left(\Delta + \frac{s(L\log s + \log d) + s\log(4F)}{N} + \frac{s\log N}{N}\right),$ where:

$\Psi^F(\hat f)$ is the training/optimization error,
$\inf_{f\in\mathcal{F}}\mathcal{R}(f, f_0)$ is the neural network approximation error,
the final term quantifies stochastic and diffusion-discretization errors.

4. Explicit Dimension-Independent Rates for Compositional Drift

When the drift $f_0$ possesses a compositional structure, the approximation power of sparse deep networks induces a dimension-robust convergence rate. In particular, for $f_0 \in \mathcal{G}(q, d, t, \beta, K)$ , choosing network depth $L \sim \log N$ , sparsity $s \sim N\,\varphi_N\,\log N$ , and defining

$\varphi_N = \max_{0\le i \le q} N^{-2\beta_i^*/(2\beta_i^* + t_i)} , \quad \beta_i^* = \beta_i \prod_{\ell=i+1}^q (\beta_\ell \wedge 1),$

yields (Corollary 3.2)

$\mathcal{R}(\widehat{f}, f_0) \lesssim \varphi_N\,\log^3 N,$

with the rate $\varphi_N$ not depending on the ambient dimension $d$ but only the layer widths $t_i$ and composition order $q$ , which can remain constant or modest even as $d \to \infty$ .

5. Empirical Validation and Comparison to Spline Methods

Numerical experiments in (Zhao et al., 14 Nov 2025) implement compositional drift using

$b(x) = -x + \phi\left(\frac{s(x)}{\theta}\right)\mathbf{1}_d, \quad s(x) = \sum_{i=1}^d x_i,\ \theta=0.2,$

where $\phi(z)$ is an oscillatory function. Simulating $N\in\{100,\ldots,5000\}$ paths in $d\in\{1,2,10,50\}$ and using neural networks of various depths and sparsity ratios, empirical mean-squared error decays at rate $\approx N^{-1}\log^3 N$ for $d=1,2,50$ . Crucially, this rate remains dimension-independent, validating the compositional rate theory.

In direct comparison:

B-spline ridge estimators exhibit exponential growth in memory and computational cost with $d$ (basis size $\sim (K+3)^d$ ),
Neural network estimators scale linearly with $d$ in parameter count ( $O(\sum p_i p_{i+1})$ ), are trainable in minibatches, and more accurately capture sharp local features and oscillatory components of $b$ .

6. Limitations, Open Questions, and Extensions

While compositional drift function estimation via neural networks achieves strong theoretical and empirical results:

The non-asymptotic risk bound is explicit but contains a training error term $\Psi^F(\cdot)$ that depends on potentially suboptimal optimization; global minimization is not guaranteed.
Joint estimation of diffusion $\sigma$ or adaptation to unknown time-grids, on-line or missing data, and non-homogeneous $b$ remain open.
The extension of risk lower bounds and construction of confidence bands for general network classes in the high-frequency regime is unsolved.

The approach does not require ergodicity or an infinite time regime, facilitates sharp risk decomposition, and is robust to the complexity of local or oscillatory features—especially when the underlying drift admits hidden compositional structure.

7. Impact and Broader Perspectives

The compositional drift framework provides a concrete example where deep networks achieve minimax-optimality in nonparametric inference by exploiting intrinsic structural assumptions rather than extrinsic dimension. In high-dimensional stochastic dynamical models, this enables practitioners to construct statistically and computationally efficient estimators for drift fields exhibiting layered or modular dependencies. As applications in stochastic control, mathematical finance, and molecular dynamics often involve multiscale or hierarchical drift mechanisms, these results inform the design of scalable learning-based drift estimators for practical systems far beyond the reach of traditional kernel or basis-expansion approaches.

PDF Markdown Chat (Pro)

References (1)

Drift Estimation for Diffusion Processes Using Neural Networks Based on Discretely Observed Independent Paths (2025)

Follow Topic

Get notified by email when new papers are published related to Compositional Drift Functions.