Papers
Topics
Authors
Recent
2000 character limit reached

Compositional Drift Functions

Updated 17 November 2025
  • Compositional drift functions are defined as nested compositions of Hölder-smooth functions, enabling effective modeling of multivariate diffusion drifts.
  • Neural network estimators employing deep ReLU architecture achieve non-asymptotic risk guarantees with convergence rates independent of the ambient dimension.
  • Empirical results confirm that such neural approaches scale linearly with dimension, outperforming spline-based methods in high-dimensional settings.

Compositional Drift Functions arise in the context of nonparametric estimation of drift in multivariate, time-homogeneous diffusion processes, where the drift function possesses an intrinsic layered composition structure. These functions represent a key class of statistical targets where modern neural-network-based estimators attain dimension-robust convergence rates, thereby circumventing the curse of dimensionality even in high-dimensional problems. Rigorous theoretical and empirical results in (Zhao et al., 14 Nov 2025) establish that for compositional drift functions, one can construct neural network estimators with non-asymptotic risk guarantees and explicit convergence rates depending only on the intrinsic compositional structure, rather than on the ambient space dimension.

1. Definition and Mathematical Characterization

A compositional drift function refers to a drift vector field b:RdRdb: \mathbb{R}^d \to \mathbb{R}^d where, for each component, the restriction to a compact domain K=[0,1]dK = [0,1]^d can be expressed as a nested composition of Hölder-smooth functions, each depending on only a small subset of variables. Formally, the target function f0f_0 belongs to a compositional Hölder class G(q,d,t,β,K)\mathcal{G}(q, d, t, \beta, K) of depth qq, where the composition can be written as: f0(x)=h0h1hq(x),f_0(x) = h_0 \circ h_1 \circ \cdots \circ h_q(x), with each hi:RtiRti1h_i: \mathbb{R}^{t_i} \to \mathbb{R}^{t_{i-1}}, t0=1t_0 = 1, tqdt_q \le d, and hih_i βi\beta_i-Hölder smooth. The parameters (q,t,β)(q, t, \beta) describe the compositional depth, width, and constituent smoothness. This function structure generalizes classical multi-index and additive models by accommodating arbitrary sub-compositions and variable reuse.

2. Drift Estimation for Diffusions: Setup

The canonical estimation task involves the observed solution (Xt)t[0,T](X_t)_{t\in[0,T]} to the stochastic differential equation: dXt=b(Xt)dt+σ(Xt)dBt,dX_t = b(X_t)\,dt + \sigma(X_t)\,dB_t, where BtB_t is dd-dimensional Brownian motion, bb and σ\sigma are globally Lipschitz, and X0LX_0 \sim \mathcal{L} is arbitrary. The estimation focuses on the nonparametric recovery of f0(x)=bi(x)1[0,1]d(x)f_0(x) = b^i(x)\,\mathbf{1}_{[0,1]^d}(x) for each component ii, using NN independent, high-frequency (i.e., mesh size Δ=T/M0\Delta = T/M \to 0) discrete-time sample paths over a fixed time horizon TT, with no ergodicity assumption.

3. Neural Network Estimators and Risk Decomposition

The state-of-the-art estimator constructs a drift estimate as a clipped, high-sparsity, deep ReLU neural network: b^NNi(x)=f^(x)1[0,1]d(x),\widehat{b}^i_{\mathrm{NN}}(x) = \widehat{f}(x)\,\mathbf{1}_{[0,1]^d}(x), where f^\widehat{f} minimizes the empirical squared-increment loss

QDN(f)=1NMn=1Nm=0M1(Ytm(n)f(Xtm(n)))2,Q_{D_N}(f) = \frac{1}{N M} \sum_{n=1}^N \sum_{m=0}^{M-1} \left( Y^{(n)}_{t_m} - f(\overline{X}^{(n)}_{t_m}) \right)^2,

with Ytm(n)Y^{(n)}_{t_m} being the normalized finite difference at time tmt_m along the nn-th trajectory.

The prediction risk is decomposed non-asymptotically (Theorem 3.1) as: R(f^,f0)4ΨF(f^)+6inffFR(f,f0)+CF2(Δ+s(Llogs+logd)+slog(4F)N+slogNN),\mathcal{R}(\hat f, f_0) \leq 4\,\Psi^F(\hat f) + 6 \inf_{f\in\mathcal{F}} \mathcal{R}(f, f_0) + C F^2\left(\Delta + \frac{s(L\log s + \log d) + s\log(4F)}{N} + \frac{s\log N}{N}\right), where:

  • ΨF(f^)\Psi^F(\hat f) is the training/optimization error,
  • inffFR(f,f0)\inf_{f\in\mathcal{F}}\mathcal{R}(f, f_0) is the neural network approximation error,
  • the final term quantifies stochastic and diffusion-discretization errors.

4. Explicit Dimension-Independent Rates for Compositional Drift

When the drift f0f_0 possesses a compositional structure, the approximation power of sparse deep networks induces a dimension-robust convergence rate. In particular, for f0G(q,d,t,β,K)f_0 \in \mathcal{G}(q, d, t, \beta, K), choosing network depth LlogNL \sim \log N, sparsity sNφNlogNs \sim N\,\varphi_N\,\log N, and defining

φN=max0iqN2βi/(2βi+ti),βi=βi=i+1q(β1),\varphi_N = \max_{0\le i \le q} N^{-2\beta_i^*/(2\beta_i^* + t_i)} , \quad \beta_i^* = \beta_i \prod_{\ell=i+1}^q (\beta_\ell \wedge 1),

yields (Corollary 3.2)

R(f^,f0)φNlog3N,\mathcal{R}(\widehat{f}, f_0) \lesssim \varphi_N\,\log^3 N,

with the rate φN\varphi_N not depending on the ambient dimension dd but only the layer widths tit_i and composition order qq, which can remain constant or modest even as dd \to \infty.

5. Empirical Validation and Comparison to Spline Methods

Numerical experiments in (Zhao et al., 14 Nov 2025) implement compositional drift using

b(x)=x+ϕ(s(x)θ)1d,s(x)=i=1dxi, θ=0.2,b(x) = -x + \phi\left(\frac{s(x)}{\theta}\right)\mathbf{1}_d, \quad s(x) = \sum_{i=1}^d x_i,\ \theta=0.2,

where ϕ(z)\phi(z) is an oscillatory function. Simulating N{100,,5000}N\in\{100,\ldots,5000\} paths in d{1,2,10,50}d\in\{1,2,10,50\} and using neural networks of various depths and sparsity ratios, empirical mean-squared error decays at rate N1log3N\approx N^{-1}\log^3 N for d=1,2,50d=1,2,50. Crucially, this rate remains dimension-independent, validating the compositional rate theory.

In direct comparison:

  • B-spline ridge estimators exhibit exponential growth in memory and computational cost with dd (basis size (K+3)d\sim (K+3)^d),
  • Neural network estimators scale linearly with dd in parameter count (O(pipi+1)O(\sum p_i p_{i+1})), are trainable in minibatches, and more accurately capture sharp local features and oscillatory components of bb.

6. Limitations, Open Questions, and Extensions

While compositional drift function estimation via neural networks achieves strong theoretical and empirical results:

  • The non-asymptotic risk bound is explicit but contains a training error term ΨF()\Psi^F(\cdot) that depends on potentially suboptimal optimization; global minimization is not guaranteed.
  • Joint estimation of diffusion σ\sigma or adaptation to unknown time-grids, on-line or missing data, and non-homogeneous bb remain open.
  • The extension of risk lower bounds and construction of confidence bands for general network classes in the high-frequency regime is unsolved.

The approach does not require ergodicity or an infinite time regime, facilitates sharp risk decomposition, and is robust to the complexity of local or oscillatory features—especially when the underlying drift admits hidden compositional structure.

7. Impact and Broader Perspectives

The compositional drift framework provides a concrete example where deep networks achieve minimax-optimality in nonparametric inference by exploiting intrinsic structural assumptions rather than extrinsic dimension. In high-dimensional stochastic dynamical models, this enables practitioners to construct statistically and computationally efficient estimators for drift fields exhibiting layered or modular dependencies. As applications in stochastic control, mathematical finance, and molecular dynamics often involve multiscale or hierarchical drift mechanisms, these results inform the design of scalable learning-based drift estimators for practical systems far beyond the reach of traditional kernel or basis-expansion approaches.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Compositional Drift Functions.