- The paper introduces non-asymptotic moment bounds, proving that the SA iterates maintain stability with an O(α) stationary bias even under decision-dependent noise.
- It establishes a localized Poisson–Gateaux regularity condition to relax global smoothness assumptions, thereby enabling analysis of non-smooth kernels such as Metropolis–Hastings.
- It proves geometric ergodicity and a functional central limit theorem, paving the way for rigorous statistical inference and effective bias reduction strategies in stochastic approximation.
Revisiting Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise
Problem Formulation and Context
Stochastic approximation (SA) is foundational for iterative root-finding in nonlinear systems, especially in reinforcement learning and empirical risk minimization. The study focuses on SA algorithms with constant stepsize, operating under decision-dependent Markovian noise, where the Markov noise distribution is coupled to the current decision variable. Formally, at each iteration, the update is: θk+1=θk+α(g(θk,Xk+1)+ξk+1(θk))
with Xk+1∼Pθk(Xk,⋅), g the update function, and ξk+1 an i.i.d. noise field. The goal is to approximate the root θ∗ of the mean-field equation gˉ(θ)=0, where gˉ(θ)=EX∼πθ[g(θ,X)], and πθ is the invariant distribution of the controlled Markov kernel Pθ.
Previous analyses, particularly in the decision-independent regime, established O(α) stationary bias and weak convergence properties under strong smoothness and stability assumptions. However, decision-dependent noise introduces analytical challenges due to the feedback between the algorithm’s state and its Markovian environment, especially when Xk+1∼Pθk(Xk,⋅)0 lacks global smoothness.
Theoretical Contributions
High-Order Moment Bounds and Stability
The authors deliver non-asymptotic Xk+1∼Pθk(Xk,⋅)1-th moment bounds for the SA iterates in the decision-dependent regime. For all Xk+1∼Pθk(Xk,⋅)2, the moment scaling is: Xk+1∼Pθk(Xk,⋅)3
in stationarity, tightly characterizing the stability of iterates and their confinement within an Xk+1∼Pθk(Xk,⋅)4 neighborhood of Xk+1∼Pθk(Xk,⋅)5. These bounds generalize the stability narrative beyond the i.i.d. noise setting to Markovian environments with decision-dependence.
Relaxed Regularity via Poisson–Gateaux Differentiability
A principal innovation is the introduction of a localized Poisson–Gateaux regularity condition, denoted Xk+1∼Pθk(Xk,⋅)6, for the solution to the Poisson equation associated with Xk+1∼Pθk(Xk,⋅)7 and Xk+1∼Pθk(Xk,⋅)8. This condition only requires differentiability of the map Xk+1∼Pθk(Xk,⋅)9 at g0 (where g1 solves the relevant Poisson equation), in contrast to previous works demanding global g2 regularity on g3. The authors demonstrate that under g4 and mild invertibility requirements, the stationary bias remains g5, even for non-smooth kernels—such as those arising from projection or clipping operations and acceptance-rejection mechanisms.
Geometric Ergodicity and Weak Convergence
A geometric weak convergence guarantee is established for the joint process g6 toward a unique stationary distribution, measured in Wasserstein distance, provided that the contraction in g7 dominates its sensitivity to g8. Formally, for a Wasserstein metric g9, there exists ξk+10 such that: ξk+11
for all initial distributions ξk+12 with finite second moment.
Functional Central Limit Theorem
For Lipschitz observables ξk+13, a CLT is proven for time-averaged iterates: ξk+14
with explicit asymptotic covariance matrix ξk+15, derived via a Poisson martingale decomposition.
Bias Characterization in Stationary Regime
The stationary bias, ξk+16, is analyzed through a Taylor expansion and a system of balance equations. The authors show that in the decision-dependent regime, non-trivial bias sources arise from the mismatch in state distributions (via the Poisson equation) and local Jacobian fluctuations. Under ξk+17, these terms are linearly controlled in ξk+18, and the bias admits a decomposition: ξk+19
where θ∗0 is the expected local response operator and θ∗1 is the stationary covariance matrix (under additional assumptions).
The analysis reveals challenges in computing explicit bias formulas for general decision-dependent kernels; the inherent coupling between decision and noise prevents straightforward operator decompositions. Nonetheless, the authors establish robust θ∗2 scaling, extending prior results to settings with practical importance, including projected Langevin dynamics, clipped state updates, and Metropolis–Hastings kernels.
Applicability to Non-Smooth Markovian Dynamics
The method covers a broad range of kernels beyond globally smooth regimes. Examples include:
- Metropolis-Hastings kernels: Despite non-differentiabilities introduced by acceptance ratios, the local θ∗3 condition holds for the composite operators relevant to stationary bias, leveraging properties of smooth but piecewise operators.
- Clipped state dynamics: Clipping induces non-smoothness, yet local Gateaux differentiability suffices for the required bias control.
- Projected Langevin dynamics: Projections on convex sets are only Lipschitz; again, local differentiability is inherited by appropriate composite maps.
This substantially extends practical coverage over prior works, which excluded such kernels due to global smoothness requirements.
Practical and Theoretical Implications
The results firmly justify the use of constant stepsize SA in settings with decision-dependent Markovian noise, guaranteeing tight bias control and stability even in non-smooth environments. Importantly, bias reduction strategies such as Richardson–Romberg extrapolation can be safely deployed once θ∗4 bias scaling is established.
From a theoretical standpoint, the relaxation to local Poisson–Gateaux conditions aligns with contemporary trends in analysis under weak regularity, enabling finer characterization in systems with projection, clipping, and acceptance-rejection steps—central motifs in modern RL, sampling, and optimization algorithms. The geometric ergodicity and CLT results provide a foundation for rigorous statistical inference in these regimes.
Future Directions
The paper identifies gaps in deriving explicit asymptotic bias formulas in the fully decision-dependent regime and suggests investigating further relaxation of convergence conditions for controlled Markov kernels. Extensions to weaker forms of ergodicity and more general state spaces remain compelling open directions for SA theory.
Conclusion
This work advances the convergence theory of constant stepsize SA under decision-dependent Markovian noise, establishing high-order moment stability, geometric ergodicity, and a functional CLT. By introducing local Poisson–Gateaux regularity, it broadens scope to non-smooth kernels and rigorously demonstrates that stationary bias is θ∗5 under practical conditions. The implications enhance both theoretical understanding and practical robustness of SA algorithms in modern stochastic environments, laying groundwork for further developments in bias control and statistical inference frameworks for decision-dependent systems.