Revisiting the Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Published 15 Apr 2026 in math.OC | (2604.13378v1)

Abstract: We revisit the convergence analysis of constant stepsize stochastic approximation (SA) with decision-dependent Markovian noise, with a focus on characterizing the stationary bias against the root of the mean-field equation. We first establish the finite-time $p$-th moment bounds for the SA iterates in a general decision-dependent setting, which serve as a stability foundation for the subsequent analysis. Building on this foundation, and leveraging a local regularity condition termed Poisson--Gateaux differentiability (WD$^\ast$) for the solution to Poisson equation induced by the decision-dependent Markov kernel, we show that the stationary bias is of the order $\mathcal{O}(α)$ for a broad class of decision-dependent settings. Additionally, we establish geometric weak convergence of the joint SA process towards a unique stationary distribution, and a functional central limit theorem. Our relaxed regularity condition enables us to cover cases of non-smooth kernels such as acceptance--rejection mechanisms, projected Langevin dynamics, and clipped state dynamics.

Abstract PDF Upgrade to Chat

Authors (4)

Summary

The paper introduces non-asymptotic moment bounds, proving that the SA iterates maintain stability with an O(α) stationary bias even under decision-dependent noise.
It establishes a localized Poisson–Gateaux regularity condition to relax global smoothness assumptions, thereby enabling analysis of non-smooth kernels such as Metropolis–Hastings.
It proves geometric ergodicity and a functional central limit theorem, paving the way for rigorous statistical inference and effective bias reduction strategies in stochastic approximation.

Revisiting Constant Stepsize Stochastic Approximation with Decision-Dependent Markovian Noise

Problem Formulation and Context

Stochastic approximation (SA) is foundational for iterative root-finding in nonlinear systems, especially in reinforcement learning and empirical risk minimization. The study focuses on SA algorithms with constant stepsize, operating under decision-dependent Markovian noise, where the Markov noise distribution is coupled to the current decision variable. Formally, at each iteration, the update is: $\theta_{k+1} = \theta_k + \alpha \left( g(\theta_k, X_{k+1}) + \xi_{k+1}(\theta_k) \right)$ with $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ , $g$ the update function, and $\xi_{k+1}$ an i.i.d. noise field. The goal is to approximate the root $\theta^*$ of the mean-field equation $\bar{g}(\theta)=0$ , where $\bar{g}(\theta) = \mathbb{E}_{X \sim \pi_\theta}[g(\theta, X)]$ , and $\pi_\theta$ is the invariant distribution of the controlled Markov kernel $P_\theta$ .

Previous analyses, particularly in the decision-independent regime, established $\mathcal{O}(\alpha)$ stationary bias and weak convergence properties under strong smoothness and stability assumptions. However, decision-dependent noise introduces analytical challenges due to the feedback between the algorithm’s state and its Markovian environment, especially when $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 0 lacks global smoothness.

Theoretical Contributions

High-Order Moment Bounds and Stability

The authors deliver non-asymptotic $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 1-th moment bounds for the SA iterates in the decision-dependent regime. For all $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 2, the moment scaling is: $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 3 in stationarity, tightly characterizing the stability of iterates and their confinement within an $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 4 neighborhood of $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 5. These bounds generalize the stability narrative beyond the i.i.d. noise setting to Markovian environments with decision-dependence.

Relaxed Regularity via Poisson–Gateaux Differentiability

A principal innovation is the introduction of a localized Poisson–Gateaux regularity condition, denoted $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 6, for the solution to the Poisson equation associated with $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 7 and $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 8. This condition only requires differentiability of the map $X_{k+1} \sim P_{\theta_k}(X_k, \cdot)$ 9 at $g$ 0 (where $g$ 1 solves the relevant Poisson equation), in contrast to previous works demanding global $g$ 2 regularity on $g$ 3. The authors demonstrate that under $g$ 4 and mild invertibility requirements, the stationary bias remains $g$ 5, even for non-smooth kernels—such as those arising from projection or clipping operations and acceptance-rejection mechanisms.

Geometric Ergodicity and Weak Convergence

A geometric weak convergence guarantee is established for the joint process $g$ 6 toward a unique stationary distribution, measured in Wasserstein distance, provided that the contraction in $g$ 7 dominates its sensitivity to $g$ 8. Formally, for a Wasserstein metric $g$ 9, there exists $\xi_{k+1}$ 0 such that: $\xi_{k+1}$ 1 for all initial distributions $\xi_{k+1}$ 2 with finite second moment.

Functional Central Limit Theorem

For Lipschitz observables $\xi_{k+1}$ 3, a CLT is proven for time-averaged iterates: $\xi_{k+1}$ 4 with explicit asymptotic covariance matrix $\xi_{k+1}$ 5, derived via a Poisson martingale decomposition.

Bias Characterization in Stationary Regime

The stationary bias, $\xi_{k+1}$ 6, is analyzed through a Taylor expansion and a system of balance equations. The authors show that in the decision-dependent regime, non-trivial bias sources arise from the mismatch in state distributions (via the Poisson equation) and local Jacobian fluctuations. Under $\xi_{k+1}$ 7, these terms are linearly controlled in $\xi_{k+1}$ 8, and the bias admits a decomposition: $\xi_{k+1}$ 9 where $\theta^*$ 0 is the expected local response operator and $\theta^*$ 1 is the stationary covariance matrix (under additional assumptions).

The analysis reveals challenges in computing explicit bias formulas for general decision-dependent kernels; the inherent coupling between decision and noise prevents straightforward operator decompositions. Nonetheless, the authors establish robust $\theta^*$ 2 scaling, extending prior results to settings with practical importance, including projected Langevin dynamics, clipped state updates, and Metropolis–Hastings kernels.

Applicability to Non-Smooth Markovian Dynamics

The method covers a broad range of kernels beyond globally smooth regimes. Examples include:

Metropolis-Hastings kernels: Despite non-differentiabilities introduced by acceptance ratios, the local $\theta^*$ 3 condition holds for the composite operators relevant to stationary bias, leveraging properties of smooth but piecewise operators.
Clipped state dynamics: Clipping induces non-smoothness, yet local Gateaux differentiability suffices for the required bias control.
Projected Langevin dynamics: Projections on convex sets are only Lipschitz; again, local differentiability is inherited by appropriate composite maps.

This substantially extends practical coverage over prior works, which excluded such kernels due to global smoothness requirements.

Practical and Theoretical Implications

The results firmly justify the use of constant stepsize SA in settings with decision-dependent Markovian noise, guaranteeing tight bias control and stability even in non-smooth environments. Importantly, bias reduction strategies such as Richardson–Romberg extrapolation can be safely deployed once $\theta^*$ 4 bias scaling is established.

From a theoretical standpoint, the relaxation to local Poisson–Gateaux conditions aligns with contemporary trends in analysis under weak regularity, enabling finer characterization in systems with projection, clipping, and acceptance-rejection steps—central motifs in modern RL, sampling, and optimization algorithms. The geometric ergodicity and CLT results provide a foundation for rigorous statistical inference in these regimes.

Future Directions

The paper identifies gaps in deriving explicit asymptotic bias formulas in the fully decision-dependent regime and suggests investigating further relaxation of convergence conditions for controlled Markov kernels. Extensions to weaker forms of ergodicity and more general state spaces remain compelling open directions for SA theory.

Conclusion

This work advances the convergence theory of constant stepsize SA under decision-dependent Markovian noise, establishing high-order moment stability, geometric ergodicity, and a functional CLT. By introducing local Poisson–Gateaux regularity, it broadens scope to non-smooth kernels and rigorously demonstrates that stationary bias is $\theta^*$ 5 under practical conditions. The implications enhance both theoretical understanding and practical robustness of SA algorithms in modern stochastic environments, laying groundwork for further developments in bias control and statistical inference frameworks for decision-dependent systems.