Papers
Topics
Authors
Recent
2000 character limit reached

Minimax Theory for Operator Learning

Updated 23 December 2025
  • The paper establishes minimax theory for operator learning by deriving both information-theoretic lower bounds and achievable upper bounds for infinite-dimensional operator estimation.
  • It details practical estimator designs, including SGD, tamed least-squares, and multilevel ridge methods, whose error rates depend on regularity and spectral decay properties.
  • The theory unifies statistical and computational trade-offs in operator regression and inverse problems, guiding the design of optimal algorithms in Hilbert space settings.

Minimax theory for operator learning characterizes the optimal statistical rates for estimating unknown (typically infinite-dimensional) operators from finite, noisy input-output data. This theory identifies both information-theoretic lower bounds, which no estimator can exceed, and upper bounds attained by concrete procedures, quantifying how intrinsic properties—such as regularity, eigenvalue decay, and ill-posedness—dictate the tractability and learnability of operator-valued regression or inverse problems in general Hilbert space settings.

1. Problem Formulations and Core Assumptions

Operator learning involves estimating an operator S:H1H2S^\dagger: H_1 \to H_2 between separable Hilbert spaces or a kernel function ϕ\phi parametrizing RϕR_\phi, given i.i.d. data (xt,yt)(x_t, y_t) or (um,fm)(u^m, f^m). The prototypical linear data model is: yt=Sxt+ϵt,y_t = S^\dagger x_t + \epsilon_t, where xtDx_t \sim \mathcal{D}, ϵt\epsilon_t is mean-zero noise, and LC=E[xx]L_C = \mathbb{E}[x \otimes x] is the input covariance operator. For learning operator kernels, the model is

fm=Rϕ[um]+εm,f^m = R_\phi[u^m] + \varepsilon^m,

with RϕR_\phi linear in ϕ\phi and parameterizing a potentially ill-posed deconvolution problem (Shi et al., 7 Feb 2024, Zhang et al., 27 Feb 2025).

The statistical risk is typically measured in prediction or estimation error norms, for example: E(St)E(S)=(StS)LC1/2HS2,\mathcal{E}(S_t) - \mathcal{E}(S^\dagger) = \| (S_t - S^\dagger) L_C^{1/2} \|_{\mathrm{HS}}^2, where HS\|\cdot\|_{\mathrm{HS}} denotes the Hilbert–Schmidt norm. For kernel/operator learning in an RKHS setting, Sobolev or interpolation-scaled norms are natural risk metrics.

Key structural assumptions include:

  • Regularity: Source conditions parameterized by rr or r~\tilde{r}, i.e., S=JLCrS^\dagger = J L_C^r or S=J~LCr~S^\dagger = \tilde{J} L_C^{\tilde{r}}.
  • Spectral Decay: Eigenvalue decay of input covariance (ss) or normal operator (r,βr,\beta), e.g., polynomial or exponential laws.
  • Moment Conditions: Uniform L4L^4 moments on noise or outputs, ensuring well-behaved sample deviations (Shi et al., 7 Feb 2024, Zhang et al., 27 Feb 2025, Adcock et al., 19 Dec 2025).

2. Minimax Lower Bounds: Fundamental Limits

Minimax lower bounds establish that, uniformly over all estimators and all operators in a regularity class, the statistical risk cannot decay faster than a problem-dependent rate. These results rely on probabilistic packing and information-theoretic arguments (typically Fano or Assouad methods):

lim infTinfS^supSCP(E(S^)E(S)γTα)>0\liminf_{T \to \infty} \inf_{\widehat S} \sup_{S^\dagger \in \mathcal{C}} \mathbb{P}\left( \mathcal{E}(\widehat S) - \mathcal{E}(S^\dagger) \geq \gamma\,T^{-\alpha} \right) > 0

with exponents α\alpha governed by regularity and spectral decay (Shi et al., 7 Feb 2024, Jin et al., 2022).

Typical sharp lower bounds:

  • SGD for linear operators (weak regularity):

T1+2rs2r+1T^{-\frac{1+2r-s}{2r+1}}

for S=JLCrS^\dagger = J L_C^r, ss-decay (Shi et al., 7 Feb 2024).

  • Operator kernel regression (adaptive Sobolev):

M2βr2βr+2r+1M^{-\frac{2\beta r}{2\beta r + 2r + 1}}

under polynomial decay, Mβ/(β+1)M^{-\beta/(\beta+1)} for exponential (Zhang et al., 27 Feb 2025).

  • RKHS operator learning (Sobolev–Hilbert–Schmidt):

nmin{max{α,β}βmax{α,β}+p,γγ1γ}n^{-\min\left\{\frac{\max\{\alpha,\beta\}-\beta'}{\max\{\alpha,\beta\}+p}, \frac{\gamma' - \gamma}{1 - \gamma}\right\}}

with respect to jointly regularized Sobolev norms (Jin et al., 2022).

  • Generic Lipschitz operators (curse of dimensionality):

Sub-algebraic decay or even

lim supnMn(FB,L)nq=\limsup_{n\to\infty} M_n(\mathcal{F}_{B,L})\cdot n^q = \infty

for all q>0q>0 (Adcock et al., 19 Dec 2025).

Lower-bound machinery typically exploits the Varshamov–Gilbert code construction for packing, balance between function separation and KL divergence (via Fano’s lemma), and explicit mode truncation to exploit eigenvalue decay.

3. Minimax Upper Bounds: Achievable Rates

Upper bounds are derived by analyzing explicit estimators—most prominently, stochastic gradient descent (SGD), regularized least squares (including tamed and multilevel variants), or histogram/RKHS-based procedures. Achieved rates directly reflect the trade-off between bias (due to regularization or spectral cutoffs) and statistical variance (due to noise amplification in small-eigenvalue directions).

  • SGD for operator regression: With step size ηt=η1tθ\eta_t = \eta_1 t^{-\theta}, the expected prediction error obeys

E[E(ST+1)E(S)]C(T+1)θ(logT)1{s=1}\mathbb{E}\big[\mathcal{E}(S_{T+1})-\mathcal{E}(S^\dagger)\big] \leq C (T+1)^{-\theta} (\log T)^{\mathbb{1}_{\{s=1\}}}

for θ\theta determined by (r,s)(r,s) (Shi et al., 7 Feb 2024).

  • Kernel operator estimation (tLSE): Tamed least-squares estimators threshold the empirical normal matrix to avoid ill-posed directions. Sharp rates match the lower bounds, e.g.,

RM(β)M2βr2βr+2r+1R_M(\beta) \lesssim M^{-\frac{2\beta r}{2\beta r+2r+1}}

with phase transitions to Mβ/(β+1)M^{-\beta/(\beta+1)} under exponential decay. The proof balances bias (spectral cutoff) and variance (SVD- or PAC-Bayesian-controlled) (Zhang et al., 27 Feb 2025).

  • Multilevel spectral regularization: Operators are reconstructed by layering ridge regression subproblems at selected input/output regularization levels, achieving

A^mlA02(n/logn)θpoly(logn)\|\widehat A_{\mathrm{ml}} - A_0\|^2 \lesssim (n/\log n)^{-\theta} \cdot \mathrm{poly}(\log n)

for θ\theta rate given by regularity and capacity parameters (Jin et al., 2022).

  • Piecewise-constant estimator for Lipschitz classes: Partitioning principal eigenmodes yields matching upper rates up to constants, with risk controlled via bias–variance and trimming errors (Adcock et al., 19 Dec 2025).
  • Koopman operator learning (EDMD/RRR): Operator-norm error for Reduced Rank Regression (RRR) and Extended DMD (EDMD) both attain the minimax exponent but differ in bias,

AπSSG^RRRσr+1(AπS)+Cnα2(α+β).\| A_\pi S - S \widehat{G}_{\rm RRR} \| \leq \sigma_{r+1}(A_\pi S) + C n^{-\frac{\alpha}{2(\alpha+\beta)}}.

RRR is minimax-optimal in rank-restricted settings (Kostic et al., 2023).

4. Regularity Classes and Spectral Constraints

Sharp minimax rates hinge on the regularity assumptions imposed on the target operator and the spectral decay (capacity) of the input covariance or associated normal operator. Two prototypical regimes are:

  • Source regularity: S=JLCrS^\dagger = J L_C^r or S=J~LCr~S^\dagger = \tilde{J} L_C^{\tilde{r}}, with JJ bounded or Hilbert–Schmidt (Shi et al., 7 Feb 2024).
  • Spectral decay:
    • Polynomial: λkk2r\lambda_k \asymp k^{-2r}, induces ill-posedness; minimax rates contain factors in rr.
    • Exponential: λkerk\lambda_k \asymp e^{-rk}, leads to faster (but still sub-algebraic) rates, with exponents saturating in β\beta (Zhang et al., 27 Feb 2025, Adcock et al., 19 Dec 2025).
    • Double-exponential: Nearly algebraic decay, but sub-algebraic minimax risk for generic Lipschitz classes (Adcock et al., 19 Dec 2025).

These parameters define natural "Sobolev-type" function spaces or RKHS norms adaptively tailored to the inverse problem structure (Zhang et al., 27 Feb 2025, Jin et al., 2022).

5. Statistical-Computational Trade-offs and Estimator Design

The interplay between computational tractability and statistical optimality arises in the design of estimators:

  • Tamed least-squares estimators threshold empirical spectral components, discarding directions where ill-posedness would otherwise dominate statistical error (Zhang et al., 27 Feb 2025).
  • Multilevel kernel operator learning (see table) applies a hierarchy of regularizations, covering the "spectral block" structure in the bias–variance trade-off. This attains minimax rates adaptively while maintaining polynomial computational complexity via ridge solvers per level (Jin et al., 2022).
Approach Regularity Used Statistical Rate
SGD (Hilbert-Schmidt) Weak/strong (r, s) TκT^{-\kappa} (Shi et al., 7 Feb 2024)
Tamed LSE (Kernel) Sobolev β\beta, rr M2βr/(2βr+2r+1)M^{-2\beta r/(2\beta r + 2r + 1)} (Zhang et al., 27 Feb 2025)
Multilevel Ridge Joint input/output (n/logn)θ(n/\log n)^{-\theta} (Jin et al., 2022)

A notable implication is that, for certain function classes (e.g., bounded Lipschitz), no algebraic rate is achievable for the minimax risk regardless of spectral decay, reflecting the curse of infinite-dimensionality (Adcock et al., 19 Dec 2025).

6. Extensions: Nonlinear Operators, Neural and Koopman Operators

Many minimax results transfer to nonlinear operator learning: when the estimator is linear but the response is nonlinear, the SGD-based scheme converges to the best linear approximation, inheriting the same minimax rates due to L2L^2 orthogonality (Shi et al., 7 Feb 2024). The theory extends to operator learning scenarios with vector-valued or real-valued RKHSs, encompassing multi-output regression, functional data analysis, and specializations such as functional linear regression (Shi et al., 7 Feb 2024).

Recent advances cover learning of nonlinear dynamical (Koopman) operators, where minimax rates for operator-norm and spectral error are established for data-driven low-rank approximations—especially Reduced Rank Regression (RRR), which is minimax-optimal relative to principal subspace bias-variance trade-off (Kostic et al., 2023).

7. Open Directions and Implications

The minimax framework reveals precise phase transitions in statistical difficulty as a function of operator regularity, eigenvalue decay, and problem ill-posedness. Adaptive Sobolev spaces and multilevel regularization strategies allow unification of classical RKHS and direct spectral approaches (Zhang et al., 27 Feb 2025, Jin et al., 2022). Persisting challenges include:

  • Extensions to non-Gaussian, heteroskedastic, or dependent noise models.
  • Fully data-driven regularity and spectral parameter estimation.
  • Scaling up multilevel or tamed estimators for very high-dimensional settings.
  • Characterization of lower bounds under additional functional or geometric constraints, and the practical implications for nonlinear neural operator learning and inverse problems (Jin et al., 2022, Adcock et al., 19 Dec 2025).

The minimax theory thus provides the statistical foundation for principled operator learning, identifying both fundamental limitations and concrete pathways to optimal or near-optimal algorithmic performance across a spectrum of linear and nonlinear, well-posed and ill-posed, operator-valued learning problems.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Minimax Theory for Operator Learning.