Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 75 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 36 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 196 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4.5 34 tok/s Pro
2000 character limit reached

Physics-Informed Empirical Risk Minimization

Updated 6 October 2025
  • Empirical risk minimization with physics-informed regularization combines data-driven loss and physical priors, enforcing PDE constraints to improve model fidelity.
  • The approach employs convex penalties based on differential operators to ensure robust concentration of excess risk and stability under noise.
  • This framework enables uncertainty quantification and efficient prediction in scenarios with sparse or noisy data by leveraging theoretical guarantees and controlled model complexity.

Empirical risk minimization (ERM) with physics-informed regularization refers to the family of statistical learning methodologies in which a standard empirical risk objective—characterizing fit to observed data—is augmented by a penalty term that enforces (exactly or approximately) the satisfaction of known physical laws. The regularization term is typically constructed so that it penalizes violation of governing equations, usually specified as partial differential equations (PDEs) or other operator equations, and it plays a central role in a diverse set of modern machine learning paradigms including physics-informed neural networks (PINNs), physics-informed kernel methods, and variational inference schemes. The subsequent sections detail the mathematical formulation, theoretical properties, common concentration and convergence results, and implications for uncertainty quantification and inductive bias.

1. Mathematical Formulation and Regularization Structure

Let nn i.i.d. or dependent observations (xi,yi)(x_i, y_i) be drawn from an unknown distribution, and let F\mathcal{F} be a class of candidate models (functions, typically parameterized as neural networks or as elements of a Sobolev space). The canonical objective for ERM with physics-informed regularization is

f^argminfF{1ni=1n(f(xi),yi)+λRphysics(f)},\widehat{f} \in \arg\min_{f \in \mathcal{F}} \left\{ \frac{1}{n} \sum_{i=1}^n \ell(f(x_i), y_i) + \lambda\, R_{\text{physics}}(f) \right\},

where (f(xi),yi)\ell(f(x_i), y_i) denotes the primary loss function (e.g., squared error, negative log-likelihood) and Rphysics(f)R_{\text{physics}}(f) is a regularization term quantifying the deviation from known physics. For instance, if N\mathcal{N} is a linear or nonlinear differential operator encoding the physical law (e.g., N(f)=ft+Dxf\mathcal{N}(f) = f_t + \mathcal{D}_x f in a transport equation), a common form is Rphysics(f)=N(f)2R_{\text{physics}}(f) = \|\mathcal{N}(f)\|^2 (generally in L2(Ω)L^2(\Omega)). The regularization parameter λ>0\lambda>0 balances data fit and fidelity to physics.

This setup encompasses scenarios where RphysicsR_{\text{physics}} is a convex (or convexifiable) penalty, allowing the use of convex analysis and tools from empirical process theory. The overall effect is to steer the estimator toward models that are both empirically sound and physically consistent (Geer et al., 2015).

2. Excess Risk Concentration and Theoretical Guarantees

A central result established in convex penalized ERM is the concentration of the excess risk, typically defined as

τ2(f):=P{ff0}+pen(f),\tau^2(f) := P\{f - f^0\} + \operatorname{pen}(f),

where P{}P\{\cdot\} denotes expectation with respect to the true data-generating distribution, f0f^0 is the ground-truth (unknown) function, and pen(f)\operatorname{pen}(f) incorporates the regularizer. Under suitable convexity, boundedness, and curvature conditions—as is standard for least-squares with convex or strongly convex penalties—the risk of the empirical minimizer f^\hat{f} concentrates sharply around a deterministic benchmark s0s_0, typically the minimizer of a theoretical functional (such as ss2E(s)s \mapsto s^2 - \mathbf{E}(s) for an upper bound E\mathbf{E} on the supremum of the empirical process): P(τ(f^)s0>δ(t))2exp(t),\mathbb{P}\left( |\tau(\hat{f}) - s_0| > \delta(t) \right) \le 2\exp(-t), with

δ(t)(s0+r0)t+log(1+nτmax2)n.\delta(t) \asymp (s_0 + r_0) \sqrt{ \frac{t + \log\left(1+\sqrt{n \tau_{\max}^2}\right)}{n} }.

This exponential tail behavior is a form of non-asymptotic concentration (Geer et al., 2015). In the normal sequence model, direct arguments (using Borell's theorem) establish concentration of the empirical error norm with deviations vanishing as O(1/n)O(1/\sqrt{n}).

For general composite convex losses (e.g., negative log-likelihood for exponential families plus strictly convex regularization), the same type of inequalities holds, subject to modifications involving the covering numbers and entropy integrals for the function class. The essential requirement is that the regularization preserves strong convexity or a margin condition near the minimum.

When incorporating physics-informed terms, provided RphysicsR_{\text{physics}} is convex and the associated function class is controlled (e.g., via bounding the envelope or by architecture design constraints in neural networks), the same concentration results apply, quantifying the estimator's stability, deviation from the deterministic benchmark, and susceptibility to noise.

3. Key Technical Ingredients: Borell's Theorem and Empirical Process Bounds

The sharp concentration results for ERM with convex regularization depend crucially on two ingredients:

  1. Borell's Theorem: For functionals F(ϵ)F(\epsilon) of a Gaussian vector ϵ\epsilon, if FF is LL-Lipschitz, then

P(F(ϵ)E[F(ϵ)]t)2exp(t22L2).\mathbb{P}\left(|F(\epsilon) - \mathbb{E}[F(\epsilon)]| \geq t \right) \leq 2\exp\left(-\frac{t^2}{2L^2}\right).

In the context of penalized regression, the estimator's deviation from the truth as a function of the noise is typically 1/n1/\sqrt{n}-Lipschitz, so the empirical estimator concentrates tightly around its mean. This argument holds for both classical and physics-informed settings provided the overall problem remains convex and noise enters linearly (as in least squares) or in a controlled fashion.

  1. Empirical Process Concentration: Concentration inequalities for the supremum of the empirical process (e.g., via the Klein–Rio theorem) allow control of the fluctuation of the empirical risk around its expectation, over subsets of the function class bounded in penalized norm. Pruning, truncation, and envelope bounding, as made explicit in truncated empirical processes (see Lemma KleinRio-truncated), ensure uniform deviation control (Geer et al., 2015).

These mechanisms extend to the physics-informed case, where the empirical process may involve residua evaluated at additional collocation points or physical domains.

4. Regularization, Convexity, and Physical Priors

Physics-informed regularization requires that RphysicsR_{\text{physics}} be convex to preserve the mathematical tractability of the problem. In most applications—PINNs, kernel-based methods, or PDE-constrained regression—the physical penalty is given in quadratic form, either as N(f)(x)2dx\int | \mathcal{N}(f)(x)|^2 dx for a (possibly linear) differential operator N\mathcal{N}, or as sums of squares over a finite set of collocation points. Even when the governing equations are nonlinear or the operator is only approximately known, convexification or bounding arguments are used to maintain convexity.

The penalty is typically scaled quadratically, Rphysics(f)R_{\text{physics}}(f) or λ2Rphysics(f)\lambda^2 R_{\text{physics}}(f), with λ\lambda chosen via cross-validation, model selection, or principled criteria such as minimizing an unbiased estimator of predictive risk in inverse problems (Li et al., 2017). This unifies classical Tikhonov regularization, structural risk minimization (SRM), and modern physics-informed learning.

When combined with function spaces that are either finite-dimensional or bounded in norm, and when the overall objective is strongly convex or satisfies a margin condition, the effect is to restrict estimation to an effective low-dimensional "manifold" induced by the physics prior, typically sharply reducing model complexity and sample requirements (Scampicchio et al., 29 Sep 2025).

5. Implications for Uncertainty Quantification and Robustness

The high-probability concentration of excess risk has direct implications for the quantification of uncertainty and generalization properties in physics-informed regimes:

  • Uncertainty Quantification: Since the empirical minimizer's risk is guaranteed to concentrate around a deterministic value, formal confidence envelopes for prediction error and model deviation from the physical law can be provided. Extensions to deep generative models and evidential learning scenarios (e.g., evidential PINNs) combine risk concentration with uncertainty modeling over unknown PDE parameters or noise levels, supporting principled propagation of uncertainty through physical systems (Yang et al., 2018, Tan et al., 27 Jan 2025).
  • Robustness and Stability: The deviation inequalities (e.g., exponential bounds) ensure that estimators are robust to both stochasticity in the data and to possible model mismatch in the physical constraint. This is crucial for scientific and engineering domains where data may be sparse, noisy, or heterogeneous, yet strong physical priors exist.
  • Inductive Bias: Physics-informed regularization introduces a strong inductive bias, anchoring solutions near the physics-constrained subspace. As shown in analyses of regularization by f-divergences and relative entropy, the support of the estimator’s law is forced to coincide with that of the reference (prior) measure, dominating the evidence provided by the data and ensuring physical admissibility (Daunas et al., 2023, Daunas et al., 2 Oct 2024, Daunas et al., 1 Feb 2024).

6. Extensions: Non-i.i.d. Data and Learning Rate Acceleration

Recent research has extended the theoretical analysis of physics-informed ERM to account for dependencies in the data (e.g., temporal or spatial mixing). The inclusion of physics-based regularizers in dependent-data ERM can lead, under knowledge alignment (i.e., when the target function nearly satisfies the physical law), to a transition from the slow Sobolev minimax rate (O(n2s/(2s+d))O(n^{-2s/(2s+d)})) to the parametric or optimal i.i.d. rate (O(1/n)O(1/n)), with no effective sample-size deflation even under dependencies such as stationarity or weak mixing (Scampicchio et al., 29 Sep 2025). This is justified by a reduction in the effective complexity of the hypothesis space, as the regularizer "shrinks" the admissible set to a low-dimensional manifold. The key ingredients in these results are chaining arguments controlled by martingale offset complexity and small-ball/hypercontractivity conditions that substitute for classical i.i.d. sub-Gaussian assumptions.

7. Practical Considerations and Deployment

For practical implementation of ERM with physics-informed regularization, several points must be addressed:

  • Check convexity and boundedness of the physics-informed penalty. If not strictly convex, convexification or careful regularizer design is required.
  • Control model complexity either through architectural constraints (in neural networks) or explicit norm bounds (in kernel methods or Sobolev spaces), to ensure that covering number assumptions are satisfied and empirical process concentration applies.
  • Tune regularization parameters such as λ\lambda and weights for the physical penalty, often via risk estimation, cross-validation, or Bayesian model selection to balance data and physics fidelity.
  • Incorporate uncertainty explicitly when required, for example by adopting evidential or Bayesian frameworks that provide predictive distributions over outputs and parameters.
  • Handle data dependencies via theoretical frameworks that support mixing processes, and avoid sample-size deflation in dependent contexts by leveraging martingale concentration and persistence conditions.
  • Interpret and validate uncertainty and physical compliance using concentration bounds, empirical coverage probabilities, and domain-specific physical metrics for applications such as PDE inversion, transport, and fluid dynamics.

The integration of physics-informed regularization into ERM permits leveraging domain knowledge to improve both statistical efficiency and physical interpretability, with rigorous non-asymptotic guarantees on excess risk, stability, and uncertainty—a key advantage in modern scientific machine learning.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Empirical Risk Minimization with Physics-Informed Regularization.