Papers
Topics
Authors
Recent
Search
2000 character limit reached

Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Published 15 Apr 2026 in math.OC | (2604.13378v1)

Abstract: We revisit the convergence analysis of constant stepsize stochastic approximation (SA) with decision-dependent Markovian noise, with a focus on characterizing the stationary bias against the root of the mean-field equation. We first establish the finite-time $p$-th moment bounds for the SA iterates in a general decision-dependent setting, which serve as a stability foundation for the subsequent analysis. Building on this foundation, and leveraging a local regularity condition termed Poisson--Gateaux differentiability (WD$\ast$) for the solution to Poisson equation induced by the decision-dependent Markov kernel, we show that the stationary bias is of the order $\mathcal{O}(α)$ for a broad class of decision-dependent settings. Additionally, we establish geometric weak convergence of the joint SA process towards a unique stationary distribution, and a functional central limit theorem. Our relaxed regularity condition enables us to cover cases of non-smooth kernels such as acceptance--rejection mechanisms, projected Langevin dynamics, and clipped state dynamics.

Summary

  • The paper introduces non-asymptotic moment bounds, proving that the SA iterates maintain stability with an O(α) stationary bias even under decision-dependent noise.
  • It establishes a localized Poisson–Gateaux regularity condition to relax global smoothness assumptions, thereby enabling analysis of non-smooth kernels such as Metropolis–Hastings.
  • It proves geometric ergodicity and a functional central limit theorem, paving the way for rigorous statistical inference and effective bias reduction strategies in stochastic approximation.

Revisiting Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Problem Formulation and Context

Stochastic approximation (SA) is foundational for iterative root-finding in nonlinear systems, especially in reinforcement learning and empirical risk minimization. The study focuses on SA algorithms with constant stepsize, operating under decision-dependent Markovian noise, where the Markov noise distribution is coupled to the current decision variable. Formally, at each iteration, the update is: θk+1=θk+α(g(θk,Xk+1)+ξk+1(θk))\theta_{k+1} = \theta_k + \alpha \left( g(\theta_k, X_{k+1}) + \xi_{k+1}(\theta_k) \right) with Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot), gg the update function, and ξk+1\xi_{k+1} an i.i.d. noise field. The goal is to approximate the root θ\theta^* of the mean-field equation gˉ(θ)=0\bar{g}(\theta)=0, where gˉ(θ)=EXπθ[g(θ,X)]\bar{g}(\theta) = \mathbb{E}_{X \sim \pi_\theta}[g(\theta, X)], and πθ\pi_\theta is the invariant distribution of the controlled Markov kernel PθP_\theta.

Previous analyses, particularly in the decision-independent regime, established O(α)\mathcal{O}(\alpha) stationary bias and weak convergence properties under strong smoothness and stability assumptions. However, decision-dependent noise introduces analytical challenges due to the feedback between the algorithm’s state and its Markovian environment, especially when Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)0 lacks global smoothness.

Theoretical Contributions

High-Order Moment Bounds and Stability

The authors deliver non-asymptotic Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)1-th moment bounds for the SA iterates in the decision-dependent regime. For all Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)2, the moment scaling is: Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)3 in stationarity, tightly characterizing the stability of iterates and their confinement within an Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)4 neighborhood of Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)5. These bounds generalize the stability narrative beyond the i.i.d. noise setting to Markovian environments with decision-dependence.

Relaxed Regularity via Poisson–Gateaux Differentiability

A principal innovation is the introduction of a localized Poisson–Gateaux regularity condition, denoted Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)6, for the solution to the Poisson equation associated with Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)7 and Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)8. This condition only requires differentiability of the map Xk+1Pθk(Xk,)X_{k+1} \sim P_{\theta_k}(X_k, \cdot)9 at gg0 (where gg1 solves the relevant Poisson equation), in contrast to previous works demanding global gg2 regularity on gg3. The authors demonstrate that under gg4 and mild invertibility requirements, the stationary bias remains gg5, even for non-smooth kernels—such as those arising from projection or clipping operations and acceptance-rejection mechanisms.

Geometric Ergodicity and Weak Convergence

A geometric weak convergence guarantee is established for the joint process gg6 toward a unique stationary distribution, measured in Wasserstein distance, provided that the contraction in gg7 dominates its sensitivity to gg8. Formally, for a Wasserstein metric gg9, there exists ξk+1\xi_{k+1}0 such that: ξk+1\xi_{k+1}1 for all initial distributions ξk+1\xi_{k+1}2 with finite second moment.

Functional Central Limit Theorem

For Lipschitz observables ξk+1\xi_{k+1}3, a CLT is proven for time-averaged iterates: ξk+1\xi_{k+1}4 with explicit asymptotic covariance matrix ξk+1\xi_{k+1}5, derived via a Poisson martingale decomposition.

Bias Characterization in Stationary Regime

The stationary bias, ξk+1\xi_{k+1}6, is analyzed through a Taylor expansion and a system of balance equations. The authors show that in the decision-dependent regime, non-trivial bias sources arise from the mismatch in state distributions (via the Poisson equation) and local Jacobian fluctuations. Under ξk+1\xi_{k+1}7, these terms are linearly controlled in ξk+1\xi_{k+1}8, and the bias admits a decomposition: ξk+1\xi_{k+1}9 where θ\theta^*0 is the expected local response operator and θ\theta^*1 is the stationary covariance matrix (under additional assumptions).

The analysis reveals challenges in computing explicit bias formulas for general decision-dependent kernels; the inherent coupling between decision and noise prevents straightforward operator decompositions. Nonetheless, the authors establish robust θ\theta^*2 scaling, extending prior results to settings with practical importance, including projected Langevin dynamics, clipped state updates, and Metropolis–Hastings kernels.

Applicability to Non-Smooth Markovian Dynamics

The method covers a broad range of kernels beyond globally smooth regimes. Examples include:

  • Metropolis-Hastings kernels: Despite non-differentiabilities introduced by acceptance ratios, the local θ\theta^*3 condition holds for the composite operators relevant to stationary bias, leveraging properties of smooth but piecewise operators.
  • Clipped state dynamics: Clipping induces non-smoothness, yet local Gateaux differentiability suffices for the required bias control.
  • Projected Langevin dynamics: Projections on convex sets are only Lipschitz; again, local differentiability is inherited by appropriate composite maps.

This substantially extends practical coverage over prior works, which excluded such kernels due to global smoothness requirements.

Practical and Theoretical Implications

The results firmly justify the use of constant stepsize SA in settings with decision-dependent Markovian noise, guaranteeing tight bias control and stability even in non-smooth environments. Importantly, bias reduction strategies such as Richardson–Romberg extrapolation can be safely deployed once θ\theta^*4 bias scaling is established.

From a theoretical standpoint, the relaxation to local Poisson–Gateaux conditions aligns with contemporary trends in analysis under weak regularity, enabling finer characterization in systems with projection, clipping, and acceptance-rejection steps—central motifs in modern RL, sampling, and optimization algorithms. The geometric ergodicity and CLT results provide a foundation for rigorous statistical inference in these regimes.

Future Directions

The paper identifies gaps in deriving explicit asymptotic bias formulas in the fully decision-dependent regime and suggests investigating further relaxation of convergence conditions for controlled Markov kernels. Extensions to weaker forms of ergodicity and more general state spaces remain compelling open directions for SA theory.

Conclusion

This work advances the convergence theory of constant stepsize SA under decision-dependent Markovian noise, establishing high-order moment stability, geometric ergodicity, and a functional CLT. By introducing local Poisson–Gateaux regularity, it broadens scope to non-smooth kernels and rigorously demonstrates that stationary bias is θ\theta^*5 under practical conditions. The implications enhance both theoretical understanding and practical robustness of SA algorithms in modern stochastic environments, laying groundwork for further developments in bias control and statistical inference frameworks for decision-dependent systems.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.