Papers
Topics
Authors
Recent
Search
2000 character limit reached

Drift-Feedback Generalization Bound

Updated 22 December 2025
  • The paper introduces a framework that quantifies nonstationary generalization via the cumulative Fisher–Rao path length, known as the reproducibility budget.
  • It integrates exogenous drift and learner-induced feedback to yield a minimax-optimal error rate of Θ(T⁻¹/² + C_T/T) and establishes a reproducibility speed limit.
  • The model unifies stationary, drift, adaptive data analysis, and performative regimes, offering a rigorous geometric measure for algorithmic performance under dynamic data distributions.

The drift-feedback generalization bound characterizes statistical learning under nonstationary distributions where the environment law evolves over time as a coupled function of both exogenous change and learner-induced feedback. Classical generalization theory collapses in this setting, necessitating new geometric machinery. The central primitive is the reproducibility budget CTC_T, defined as the cumulative Fisher–Rao path length traversed by the underlying data-generating process over TT rounds. This intrinsic metric quantifies the joint effect of exogenous drift and adaptive feedback, governing a minimax-optimal generalization error rate of Θ(T1/2+CT/T)\Theta(T^{-1/2} + C_T/T). The framework unifies stationary, drift, adaptive data analysis, and performative regimes under a single information-geometric structure, establishing a reproducibility speed limit whereby no algorithm can perform better than the imposed average drift rate.

1. Information-Geometric Drift and the Reproducibility Budget

Let {pθ(x,y)}θΘRd\{p_\theta(x,y)\}_{\theta\in\Theta\subset\mathbb{R}^d} be a smooth parametric model equipped with its Fisher–Rao Riemannian metric gθg_\theta. For any trajectory θ0θ1θT\theta_0\to\theta_1\to\cdots\to\theta_T induced by the interaction of learner and environment, the Fisher–Rao distance between consecutive states is: dF(θt,θt+1)=infγ01γ˙(s)gγ(s)dsd_F(\theta_t, \theta_{t+1}) = \inf_{\gamma} \int_0^1 \|\dot\gamma(s)\|_{g_{\gamma(s)}} ds where γ\gamma runs over piecewise-C1C^1 curves joining θt\theta_t and θt+1\theta_{t+1}, and vgθ2=vg(θ)v\|v\|^2_{g_\theta}=v^\top g(\theta) v.

The reproducibility budget is defined as the total accumulated Fisher–Rao path length: CT=t=0T1dF(θt,θt+1)0Tθ˙sgθsdsC_T = \sum_{t=0}^{T-1} d_F(\theta_t, \theta_{t+1}) \approx \int_0^T \|\dot\theta_s\|_{g_{\theta_s}} ds This measures the intrinsic information-geometric motion of the data-generating law as the environment evolves, encompassing both unpredictable exogenous changes and endogenous learner feedback (Zaichyk, 15 Dec 2025).

2. Regularity and Decomposition of Drift

Regularity assumptions ensure analytical control:

  • Fisher information g(θ)g(\theta) is positive-definite, twice continuously differentiable; the score sθ=θlogpθs_\theta = \nabla_\theta \log p_\theta has bounded second moments.
  • Loss (f(x),y)\ell(f(x),y) is σ\sigma-sub-Gaussian under each pθp_\theta.
  • Environment updates θt+1=F(θt,ut,ηt)\theta_{t+1} = F(\theta_t, u_t, \eta_t) are smooth in control utu_t, with bounded energy E[ut2]B\mathbb{E}[\|u_t\|^2] \le B. Exogenous noise ηt\eta_t is independent of utu_t.

One-step drift is decomposed as: Δθt=[F(θt,0,ηt)θt]+JuF(θt,0,ηt)ut+rt\Delta\theta_t = [F(\theta_t, 0, \eta_t) - \theta_t] + J_u F(\theta_t, 0, \eta_t) u_t + r_t with

dt:=E[F(θt,0,ηt)θtgθtθt](exogenous drift)d_t := \mathbb{E}[\|F(\theta_t, 0, \eta_t) - \theta_t\|_{g_{\theta_t}} \mid \theta_t] \qquad \text{(exogenous drift)}

κt:=E[JuF(θt,0,ηt)utgθtθt](endogenous/feedback drift)\kappa_t := \mathbb{E}[\|J_u F(\theta_t, 0, \eta_t) u_t\|_{g_{\theta_t}} \mid \theta_t] \qquad \text{(endogenous/feedback drift)}

E[Δθtgθt]dt+κt+εt\mathbb{E}[\|\Delta\theta_t\|_{g_{\theta_t}}] \le d_t + \kappa_t + \varepsilon_t

Summing yields CTt=0T1(dt+κt)C_T \approx \sum_{t=0}^{T-1}(d_t + \kappa_t) (Zaichyk, 15 Dec 2025).

3. The Drift-Feedback Generalization Bound

At each round tt, the learner produces ftf_t and receives (xt,yt)pθt(x_t, y_t) \sim p_{\theta_t}. Define

R^T=1Tt=1T(ft(xt),yt)RT=1Tt=1TEθt[(ft(x),y)]\hat{R}_T = \frac{1}{T} \sum_{t=1}^T \ell(f_t(x_t), y_t) \qquad R_T = \frac{1}{T} \sum_{t=1}^T \mathbb{E}_{\theta_t}\left[\ell(f_t(x), y)\right]

Lemma (add–subtract): R^TRT1Tt=1TZt+1Tt=1T1R(θt+1,ft)R(θt,ft)|\hat{R}_T - R_T| \leq \frac{1}{T} \left| \sum_{t=1}^T Z_t \right| + \frac{1}{T} \sum_{t=1}^{T-1} |R(\theta_{t+1}, f_t) - R(\theta_t, f_t)| with ZtZ_t a martingale difference.

Under sub-Gaussianity and smoothness,

  • Sampling term: E1TZtC0T1/2\mathbb{E}\left| \frac{1}{T}\sum Z_t \right| \leq C_0 T^{-1/2}
  • Drift term: E[R(θt+1,ft)R(θt,ft)]Lp(dt+κt)\mathbb{E}[|R(\theta_{t+1}, f_t) - R(\theta_t, f_t)|] \leq L_p (d_t + \kappa_t)

Aggregated bound: ER^TRTC0T1/2+LpTt=0T1(dt+κt)=O(T1/2+CT/T)\mathbb{E} |\hat{R}_T - R_T| \leq C_0 T^{-1/2} + \frac{L_p}{T} \sum_{t=0}^{T-1} (d_t + \kappa_t) = O(T^{-1/2} + C_T / T)

4. Minimax Lower Bound and Speed Limit

No estimator can outperform the average drift rate. An explicit Fano-type construction in a Fisher-arclength parameterization yields: infR^Tsupprocess: CTCER^TRTc1max(T1/2,C/T)\inf_{\hat{R}_T} \sup_{\text{process: } C_T \leq C} \mathbb{E} |\hat{R}_T - R_T| \geq c_1 \max(T^{-1/2},\, C/T) The proof partitions time into blocks with step-size δ\delta, employing codewords and the KL/Fano argument, yielding minimax optimality. In the stationary case (C=0C=0), the classical T1/2T^{-1/2} rate is recovered, while for maximal drift, the CT/TC_T / T term dominates.

The reproducibility speed-limit: No algorithm can achieve a worst-case generalization error below the average Fisher–Rao drift rate CT/TC_T/T (Zaichyk, 15 Dec 2025).

5. Connections to Existing Drift and Variation-Budget Frameworks

The drift-feedback structure recovers well-established bounds:

  • Stationary i.i.d.: dtκt0    O(T1/2)d_t \equiv \kappa_t \equiv 0 \implies O(T^{-1/2})
  • Pure drift: O(T1/2+(1/T)dt)O(T^{-1/2} + (1/T) \sum d_t)
  • Adaptive data analysis: O(T1/2+(1/T)κt)O(T^{-1/2} + (1/T) \sum \kappa_t)
  • Performative equilibrium: dt,κt0    O(T1/2)d_t, \kappa_t \to 0 \implies O(T^{-1/2})

Related work in concept drift learning (Hanneke et al., 2015) quantifies the drift by a sequence Δt\Delta_t and delivers window-adaptive error bounds: errorT(h^T)O(min1mT1{1mi=TmT1j=i+1TΔj+dln(m/d)+ln(1/δ)m})\text{error}_T(\hat{h}_T) \le O\left(\min_{1 \le m \le T-1} \left\{ \frac{1}{m}\sum_{i=T-m}^{T-1}\sum_{j=i+1}^T \Delta_j + \frac{d \ln(m/d) + \ln(1/\delta)}{m} \right\}\right) This structure cleanly splits error into drift and complexity, paralleling the decomposition in the drift-feedback regime.

In generalized linear bandits under parameter drift (Faury et al., 2021), the variation-budget BTB_T similarly controls regret rates:

  • Orthogonal action sets: O(d2/3BT1/3T2/3)O(d^{2/3}B_T^{1/3}T^{2/3})
  • General sets: O(d9/10BT1/5T4/5)O(d^{9/10}B_T^{1/5}T^{4/5}) The projection step feeds back the observed drift into the confidence set, analogous to the drift-feedback framework.

6. Unified Geometric Characterization and Model Adaptivity

The drift-feedback generalization theory unifies exogenous drift, adaptive analysis, and performative prediction by projecting complexity and stability penalties onto the intrinsic Fisher–Rao path length CTC_T. The information-geometric approach enables precise characterization and algorithm-independent lower bounds. Settings previously analyzed via extrinsic variation-budgets or stability coefficients are subcases determined by the projection of CTC_T onto the principal mode of environmental change.

A plausible implication is that the geometric drift metric provides a canonical unifying resource for quantifying nonstationarity across statistical learning, bandit optimization, and adaptive decision processes. The reproducibility budget thus subsumes and refines previously used drift and variation notions, providing a rigorous speed-limit for adaptive generalization (Zaichyk, 15 Dec 2025, Hanneke et al., 2015, Faury et al., 2021).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Drift-Feedback Generalization Bound.