Papers
Topics
Authors
Recent
2000 character limit reached

Drift-Feedback Generalization Bound

Updated 22 December 2025
  • The paper introduces a framework that quantifies nonstationary generalization via the cumulative Fisher–Rao path length, known as the reproducibility budget.
  • It integrates exogenous drift and learner-induced feedback to yield a minimax-optimal error rate of Θ(T⁻¹/² + C_T/T) and establishes a reproducibility speed limit.
  • The model unifies stationary, drift, adaptive data analysis, and performative regimes, offering a rigorous geometric measure for algorithmic performance under dynamic data distributions.

The drift-feedback generalization bound characterizes statistical learning under nonstationary distributions where the environment law evolves over time as a coupled function of both exogenous change and learner-induced feedback. Classical generalization theory collapses in this setting, necessitating new geometric machinery. The central primitive is the reproducibility budget CTC_T, defined as the cumulative Fisher–Rao path length traversed by the underlying data-generating process over TT rounds. This intrinsic metric quantifies the joint effect of exogenous drift and adaptive feedback, governing a minimax-optimal generalization error rate of Θ(T1/2+CT/T)\Theta(T^{-1/2} + C_T/T). The framework unifies stationary, drift, adaptive data analysis, and performative regimes under a single information-geometric structure, establishing a reproducibility speed limit whereby no algorithm can perform better than the imposed average drift rate.

1. Information-Geometric Drift and the Reproducibility Budget

Let {pθ(x,y)}θΘRd\{p_\theta(x,y)\}_{\theta\in\Theta\subset\mathbb{R}^d} be a smooth parametric model equipped with its Fisher–Rao Riemannian metric gθg_\theta. For any trajectory θ0θ1θT\theta_0\to\theta_1\to\cdots\to\theta_T induced by the interaction of learner and environment, the Fisher–Rao distance between consecutive states is: dF(θt,θt+1)=infγ01γ˙(s)gγ(s)dsd_F(\theta_t, \theta_{t+1}) = \inf_{\gamma} \int_0^1 \|\dot\gamma(s)\|_{g_{\gamma(s)}} ds where γ\gamma runs over piecewise-C1C^1 curves joining θt\theta_t and θt+1\theta_{t+1}, and vgθ2=vg(θ)v\|v\|^2_{g_\theta}=v^\top g(\theta) v.

The reproducibility budget is defined as the total accumulated Fisher–Rao path length: CT=t=0T1dF(θt,θt+1)0Tθ˙sgθsdsC_T = \sum_{t=0}^{T-1} d_F(\theta_t, \theta_{t+1}) \approx \int_0^T \|\dot\theta_s\|_{g_{\theta_s}} ds This measures the intrinsic information-geometric motion of the data-generating law as the environment evolves, encompassing both unpredictable exogenous changes and endogenous learner feedback (Zaichyk, 15 Dec 2025).

2. Regularity and Decomposition of Drift

Regularity assumptions ensure analytical control:

  • Fisher information g(θ)g(\theta) is positive-definite, twice continuously differentiable; the score sθ=θlogpθs_\theta = \nabla_\theta \log p_\theta has bounded second moments.
  • Loss (f(x),y)\ell(f(x),y) is σ\sigma-sub-Gaussian under each pθp_\theta.
  • Environment updates θt+1=F(θt,ut,ηt)\theta_{t+1} = F(\theta_t, u_t, \eta_t) are smooth in control utu_t, with bounded energy E[ut2]B\mathbb{E}[\|u_t\|^2] \le B. Exogenous noise ηt\eta_t is independent of utu_t.

One-step drift is decomposed as: Δθt=[F(θt,0,ηt)θt]+JuF(θt,0,ηt)ut+rt\Delta\theta_t = [F(\theta_t, 0, \eta_t) - \theta_t] + J_u F(\theta_t, 0, \eta_t) u_t + r_t with

dt:=E[F(θt,0,ηt)θtgθtθt](exogenous drift)d_t := \mathbb{E}[\|F(\theta_t, 0, \eta_t) - \theta_t\|_{g_{\theta_t}} \mid \theta_t] \qquad \text{(exogenous drift)}

κt:=E[JuF(θt,0,ηt)utgθtθt](endogenous/feedback drift)\kappa_t := \mathbb{E}[\|J_u F(\theta_t, 0, \eta_t) u_t\|_{g_{\theta_t}} \mid \theta_t] \qquad \text{(endogenous/feedback drift)}

E[Δθtgθt]dt+κt+εt\mathbb{E}[\|\Delta\theta_t\|_{g_{\theta_t}}] \le d_t + \kappa_t + \varepsilon_t

Summing yields CTt=0T1(dt+κt)C_T \approx \sum_{t=0}^{T-1}(d_t + \kappa_t) (Zaichyk, 15 Dec 2025).

3. The Drift-Feedback Generalization Bound

At each round tt, the learner produces ftf_t and receives (xt,yt)pθt(x_t, y_t) \sim p_{\theta_t}. Define

R^T=1Tt=1T(ft(xt),yt)RT=1Tt=1TEθt[(ft(x),y)]\hat{R}_T = \frac{1}{T} \sum_{t=1}^T \ell(f_t(x_t), y_t) \qquad R_T = \frac{1}{T} \sum_{t=1}^T \mathbb{E}_{\theta_t}\left[\ell(f_t(x), y)\right]

Lemma (add–subtract): R^TRT1Tt=1TZt+1Tt=1T1R(θt+1,ft)R(θt,ft)|\hat{R}_T - R_T| \leq \frac{1}{T} \left| \sum_{t=1}^T Z_t \right| + \frac{1}{T} \sum_{t=1}^{T-1} |R(\theta_{t+1}, f_t) - R(\theta_t, f_t)| with ZtZ_t a martingale difference.

Under sub-Gaussianity and smoothness,

  • Sampling term: E1TZtC0T1/2\mathbb{E}\left| \frac{1}{T}\sum Z_t \right| \leq C_0 T^{-1/2}
  • Drift term: E[R(θt+1,ft)R(θt,ft)]Lp(dt+κt)\mathbb{E}[|R(\theta_{t+1}, f_t) - R(\theta_t, f_t)|] \leq L_p (d_t + \kappa_t)

Aggregated bound: ER^TRTC0T1/2+LpTt=0T1(dt+κt)=O(T1/2+CT/T)\mathbb{E} |\hat{R}_T - R_T| \leq C_0 T^{-1/2} + \frac{L_p}{T} \sum_{t=0}^{T-1} (d_t + \kappa_t) = O(T^{-1/2} + C_T / T)

4. Minimax Lower Bound and Speed Limit

No estimator can outperform the average drift rate. An explicit Fano-type construction in a Fisher-arclength parameterization yields: infR^Tsupprocess: CTCER^TRTc1max(T1/2,C/T)\inf_{\hat{R}_T} \sup_{\text{process: } C_T \leq C} \mathbb{E} |\hat{R}_T - R_T| \geq c_1 \max(T^{-1/2},\, C/T) The proof partitions time into blocks with step-size δ\delta, employing codewords and the KL/Fano argument, yielding minimax optimality. In the stationary case (C=0C=0), the classical T1/2T^{-1/2} rate is recovered, while for maximal drift, the CT/TC_T / T term dominates.

The reproducibility speed-limit: No algorithm can achieve a worst-case generalization error below the average Fisher–Rao drift rate CT/TC_T/T (Zaichyk, 15 Dec 2025).

5. Connections to Existing Drift and Variation-Budget Frameworks

The drift-feedback structure recovers well-established bounds:

  • Stationary i.i.d.: dtκt0    O(T1/2)d_t \equiv \kappa_t \equiv 0 \implies O(T^{-1/2})
  • Pure drift: O(T1/2+(1/T)dt)O(T^{-1/2} + (1/T) \sum d_t)
  • Adaptive data analysis: O(T1/2+(1/T)κt)O(T^{-1/2} + (1/T) \sum \kappa_t)
  • Performative equilibrium: dt,κt0    O(T1/2)d_t, \kappa_t \to 0 \implies O(T^{-1/2})

Related work in concept drift learning (Hanneke et al., 2015) quantifies the drift by a sequence Δt\Delta_t and delivers window-adaptive error bounds: errorT(h^T)O(min1mT1{1mi=TmT1j=i+1TΔj+dln(m/d)+ln(1/δ)m})\text{error}_T(\hat{h}_T) \le O\left(\min_{1 \le m \le T-1} \left\{ \frac{1}{m}\sum_{i=T-m}^{T-1}\sum_{j=i+1}^T \Delta_j + \frac{d \ln(m/d) + \ln(1/\delta)}{m} \right\}\right) This structure cleanly splits error into drift and complexity, paralleling the decomposition in the drift-feedback regime.

In generalized linear bandits under parameter drift (Faury et al., 2021), the variation-budget BTB_T similarly controls regret rates:

  • Orthogonal action sets: O(d2/3BT1/3T2/3)O(d^{2/3}B_T^{1/3}T^{2/3})
  • General sets: O(d9/10BT1/5T4/5)O(d^{9/10}B_T^{1/5}T^{4/5}) The projection step feeds back the observed drift into the confidence set, analogous to the drift-feedback framework.

6. Unified Geometric Characterization and Model Adaptivity

The drift-feedback generalization theory unifies exogenous drift, adaptive analysis, and performative prediction by projecting complexity and stability penalties onto the intrinsic Fisher–Rao path length CTC_T. The information-geometric approach enables precise characterization and algorithm-independent lower bounds. Settings previously analyzed via extrinsic variation-budgets or stability coefficients are subcases determined by the projection of CTC_T onto the principal mode of environmental change.

A plausible implication is that the geometric drift metric provides a canonical unifying resource for quantifying nonstationarity across statistical learning, bandit optimization, and adaptive decision processes. The reproducibility budget thus subsumes and refines previously used drift and variation notions, providing a rigorous speed-limit for adaptive generalization (Zaichyk, 15 Dec 2025, Hanneke et al., 2015, Faury et al., 2021).

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Drift-Feedback Generalization Bound.