Papers
Topics
Authors
Recent
Search
2000 character limit reached

Convergence of the Iterates of the Stochastic Proximal Gradient Method

Published 15 Apr 2026 in math.OC | (2604.13388v1)

Abstract: We propose a novel study of the stochastic proximal gradient method for minimizing the sum of two convex functions, one of which is smooth. Under suitable assumptions and without requiring any boundedness or control of the variance of the random variables, we derive the almost sure convergence and the convergence in the mean of the iterates to a solution of the minimization problem. The results are applied to classification and convex feasibility problems.

Authors (1)

Summary

  • The paper establishes almost sure and L1 convergence of the stochastic proximal gradient method under weak integrability and growth conditions.
  • The paper replaces deterministic oracles with random approximations, enabling scalable and robust optimization for composite convex objectives.
  • The paper extends classical results by applying its framework to mixed-loss classification and inconsistent feasibility, paving the way for decentralized applications.

Convergence Properties of Stochastic Proximal Gradient Methods

Problem Formulation and Methodological Framework

The paper addresses the minimization of composite convex objectives in Hilbert spaces, specifically minxHf(x)+g(x)\min_{x \in H} f(x) + g(x), where ff is lower semicontinuous and convex, and gg is convex, differentiable, with g\nabla g being β\beta-Lipschitz continuous. This canonical setting encompasses a vast class of optimization problems in signal processing, statistical learning, and operations research.

The classical solution approach employs the proximal-gradient (forward-backward) scheme. The stochastic variant considered here (Algorithm 1) replaces f,gf,g—or their oracles (proximity operator and gradient)—with random approximations fk,gkf_k, g_k using the random index kk at each iteration. This covers both randomization of the gradient and the proximal mapping, addressing practical concerns related to scalability and tractability when applied to large-scale or distributed optimization.

Uniquely, the analysis does not impose boundedness or strict variance control on the stochastic terms; instead, weaker integrability and growth conditions are adopted. Convergence is established under two key assumptions: (i) the existence of solutions with averaged zero stochastic sub-gradient and finite mean squared norm (Assumption 1), and (ii) domain consistency and only polynomial growth of sub-gradient norms (Assumption 2).

Main Results: Convergence Analysis

The paper’s principal theorem demonstrates that the stochastic proximal gradient algorithm attains almost sure and L1L^1 convergence of iterates to a solution of the composite minimization problem. These results are shown for step sizes (γn)n(\gamma_n)_n satisfying ff0 and ff1, paralleling classical stochastic approximation settings.

Specifically, under the proposed framework:

  • The iterates ff2 are bounded almost surely and in ff3.
  • The objective values ff4 converge in the liminf to the minimal value.
  • The entire sequence ff5 converges almost surely (and in ff6 if sub-gradient growth is quadratic) to a random variable supported on the set of minimizers.
  • The (possibly random) limiting point satisfies ff7 for some solution ff8, i.e., the limiting gradient is that of an optimal solution.

This extends and generalizes earlier stochastic proximal analyses by removing uniform boundedness and stricter variance requirements, and by covering random approximations of both ff9 and gg0.

Additional corollaries establish almost sure convergence for the stochastic proximal point and stochastic gradient methods for single-function scenarios.

Applications: Classification and Convex Feasibility

Two domains exemplify the theoretical results:

Mixed-Loss Classification

A binary classification problem involving hinge and logistic loss over two data subsets is addressed. The proximity operator for hinge loss and gradient for logistic loss are explicitly stated, and convergence is guaranteed for the stochastic composite algorithm, even when the losses are mixed over random minibatches. Uniform boundedness of gradients (guaranteed by finiteness of data) and existence of solutions ensure all conditions of the main theorem.

Inconsistent Convex Feasibility

The algorithm is applied to inconsistent convex feasibility, minimizing the expected squared distance to randomly selected convex sets subject to a global constraint. The stochastic proximal gradient algorithm, using random projections, provably converges almost surely and in gg1 to a minimizer of the mean-squared distance, covering inconsistent feasibility situations not tackled by classical projection methods. The analysis does not require the activation of all sets at every iteration, illuminating algorithmic efficiency for large-scale and distributed settings.

Implications and Future Directions

The results substantially broaden the theoretical understanding of stochastic optimization methods for composite convex objectives. They permit unbounded, but integrable, stochasticity and variable sub-gradient growth, crucial for modeling realism in machine learning, signal processing, and variational analysis where exact or deterministic oracles are impractical.

Practically, the findings pave the way for robust, scalable algorithmic designs suitable for decentralized and federated environments, where randomness arises both from data partitioning and function approximation. The theoretical guarantees under weak conditions enhance reliability of iterative methods in noisy, large-scale, or heterogeneous settings.

Theoretically, the work suggests paths forward for relaxing assumptions in other stochastic approximations (e.g., monotone inclusions, operator splitting), and potentially extending the analysis to nonconvex and non-Euclidean domains, such as Riemannian optimization or variational inequalities in Banach spaces. The geometric tools and stochastic Fejér monotonicity concepts may be useful for future analyses involving more complex stochastic dynamics.

Conclusion

This paper rigorously establishes almost sure and mean convergence for the stochastic proximal gradient method under weaker assumptions than previously studied. The work extends the reach of stochastic iterative algorithms for convex optimization, emphasizing practicality, generality, and robustness. The results encompass important applications including classification with mixed losses and inconsistent feasibility, marking a significant advance in stochastic convex optimization theory (2604.13388).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.