Convergence of the Iterates of the Stochastic Proximal Gradient Method

Published 15 Apr 2026 in math.OC | (2604.13388v1)

Abstract: We propose a novel study of the stochastic proximal gradient method for minimizing the sum of two convex functions, one of which is smooth. Under suitable assumptions and without requiring any boundedness or control of the variance of the random variables, we derive the almost sure convergence and the convergence in the mean of the iterates to a solution of the minimization problem. The results are applied to classification and convex feasibility problems.

Abstract PDF Upgrade to Chat

Authors (1)

Javier I. Madariaga

Summary

The paper establishes almost sure and L1 convergence of the stochastic proximal gradient method under weak integrability and growth conditions.
The paper replaces deterministic oracles with random approximations, enabling scalable and robust optimization for composite convex objectives.
The paper extends classical results by applying its framework to mixed-loss classification and inconsistent feasibility, paving the way for decentralized applications.

Convergence Properties of Stochastic Proximal Gradient Methods

Problem Formulation and Methodological Framework

The paper addresses the minimization of composite convex objectives in Hilbert spaces, specifically $\min_{x \in H} f(x) + g(x)$ , where $f$ is lower semicontinuous and convex, and $g$ is convex, differentiable, with $\nabla g$ being $\beta$ -Lipschitz continuous. This canonical setting encompasses a vast class of optimization problems in signal processing, statistical learning, and operations research.

The classical solution approach employs the proximal-gradient (forward-backward) scheme. The stochastic variant considered here (Algorithm 1) replaces $f,g$ —or their oracles (proximity operator and gradient)—with random approximations $f_k, g_k$ using the random index $k$ at each iteration. This covers both randomization of the gradient and the proximal mapping, addressing practical concerns related to scalability and tractability when applied to large-scale or distributed optimization.

Uniquely, the analysis does not impose boundedness or strict variance control on the stochastic terms; instead, weaker integrability and growth conditions are adopted. Convergence is established under two key assumptions: (i) the existence of solutions with averaged zero stochastic sub-gradient and finite mean squared norm (Assumption 1), and (ii) domain consistency and only polynomial growth of sub-gradient norms (Assumption 2).

Main Results: Convergence Analysis

The paper’s principal theorem demonstrates that the stochastic proximal gradient algorithm attains almost sure and $L^1$ convergence of iterates to a solution of the composite minimization problem. These results are shown for step sizes $(\gamma_n)_n$ satisfying $f$ 0 and $f$ 1, paralleling classical stochastic approximation settings.

Specifically, under the proposed framework:

The iterates $f$ 2 are bounded almost surely and in $f$ 3.
The objective values $f$ 4 converge in the liminf to the minimal value.
The entire sequence $f$ 5 converges almost surely (and in $f$ 6 if sub-gradient growth is quadratic) to a random variable supported on the set of minimizers.
The (possibly random) limiting point satisfies $f$ 7 for some solution $f$ 8, i.e., the limiting gradient is that of an optimal solution.

This extends and generalizes earlier stochastic proximal analyses by removing uniform boundedness and stricter variance requirements, and by covering random approximations of both $f$ 9 and $g$ 0.

Additional corollaries establish almost sure convergence for the stochastic proximal point and stochastic gradient methods for single-function scenarios.

Applications: Classification and Convex Feasibility

Two domains exemplify the theoretical results:

Mixed-Loss Classification

A binary classification problem involving hinge and logistic loss over two data subsets is addressed. The proximity operator for hinge loss and gradient for logistic loss are explicitly stated, and convergence is guaranteed for the stochastic composite algorithm, even when the losses are mixed over random minibatches. Uniform boundedness of gradients (guaranteed by finiteness of data) and existence of solutions ensure all conditions of the main theorem.

Inconsistent Convex Feasibility

The algorithm is applied to inconsistent convex feasibility, minimizing the expected squared distance to randomly selected convex sets subject to a global constraint. The stochastic proximal gradient algorithm, using random projections, provably converges almost surely and in $g$ 1 to a minimizer of the mean-squared distance, covering inconsistent feasibility situations not tackled by classical projection methods. The analysis does not require the activation of all sets at every iteration, illuminating algorithmic efficiency for large-scale and distributed settings.

Implications and Future Directions

The results substantially broaden the theoretical understanding of stochastic optimization methods for composite convex objectives. They permit unbounded, but integrable, stochasticity and variable sub-gradient growth, crucial for modeling realism in machine learning, signal processing, and variational analysis where exact or deterministic oracles are impractical.

Practically, the findings pave the way for robust, scalable algorithmic designs suitable for decentralized and federated environments, where randomness arises both from data partitioning and function approximation. The theoretical guarantees under weak conditions enhance reliability of iterative methods in noisy, large-scale, or heterogeneous settings.

Theoretically, the work suggests paths forward for relaxing assumptions in other stochastic approximations (e.g., monotone inclusions, operator splitting), and potentially extending the analysis to nonconvex and non-Euclidean domains, such as Riemannian optimization or variational inequalities in Banach spaces. The geometric tools and stochastic Fejér monotonicity concepts may be useful for future analyses involving more complex stochastic dynamics.

Conclusion

This paper rigorously establishes almost sure and mean convergence for the stochastic proximal gradient method under weaker assumptions than previously studied. The work extends the reach of stochastic iterative algorithms for convex optimization, emphasizing practicality, generality, and robustness. The results encompass important applications including classification with mixed losses and inconsistent feasibility, marking a significant advance in stochastic convex optimization theory (2604.13388).