- The paper establishes almost sure and L1 convergence of the stochastic proximal gradient method under weak integrability and growth conditions.
- The paper replaces deterministic oracles with random approximations, enabling scalable and robust optimization for composite convex objectives.
- The paper extends classical results by applying its framework to mixed-loss classification and inconsistent feasibility, paving the way for decentralized applications.
Convergence Properties of Stochastic Proximal Gradient Methods
The paper addresses the minimization of composite convex objectives in Hilbert spaces, specifically minx∈Hf(x)+g(x), where f is lower semicontinuous and convex, and g is convex, differentiable, with ∇g being β-Lipschitz continuous. This canonical setting encompasses a vast class of optimization problems in signal processing, statistical learning, and operations research.
The classical solution approach employs the proximal-gradient (forward-backward) scheme. The stochastic variant considered here (Algorithm 1) replaces f,g—or their oracles (proximity operator and gradient)—with random approximations fk,gk using the random index k at each iteration. This covers both randomization of the gradient and the proximal mapping, addressing practical concerns related to scalability and tractability when applied to large-scale or distributed optimization.
Uniquely, the analysis does not impose boundedness or strict variance control on the stochastic terms; instead, weaker integrability and growth conditions are adopted. Convergence is established under two key assumptions: (i) the existence of solutions with averaged zero stochastic sub-gradient and finite mean squared norm (Assumption 1), and (ii) domain consistency and only polynomial growth of sub-gradient norms (Assumption 2).
Main Results: Convergence Analysis
The paper’s principal theorem demonstrates that the stochastic proximal gradient algorithm attains almost sure and L1 convergence of iterates to a solution of the composite minimization problem. These results are shown for step sizes (γn)n satisfying f0 and f1, paralleling classical stochastic approximation settings.
Specifically, under the proposed framework:
- The iterates f2 are bounded almost surely and in f3.
- The objective values f4 converge in the liminf to the minimal value.
- The entire sequence f5 converges almost surely (and in f6 if sub-gradient growth is quadratic) to a random variable supported on the set of minimizers.
- The (possibly random) limiting point satisfies f7 for some solution f8, i.e., the limiting gradient is that of an optimal solution.
This extends and generalizes earlier stochastic proximal analyses by removing uniform boundedness and stricter variance requirements, and by covering random approximations of both f9 and g0.
Additional corollaries establish almost sure convergence for the stochastic proximal point and stochastic gradient methods for single-function scenarios.
Applications: Classification and Convex Feasibility
Two domains exemplify the theoretical results:
Mixed-Loss Classification
A binary classification problem involving hinge and logistic loss over two data subsets is addressed. The proximity operator for hinge loss and gradient for logistic loss are explicitly stated, and convergence is guaranteed for the stochastic composite algorithm, even when the losses are mixed over random minibatches. Uniform boundedness of gradients (guaranteed by finiteness of data) and existence of solutions ensure all conditions of the main theorem.
Inconsistent Convex Feasibility
The algorithm is applied to inconsistent convex feasibility, minimizing the expected squared distance to randomly selected convex sets subject to a global constraint. The stochastic proximal gradient algorithm, using random projections, provably converges almost surely and in g1 to a minimizer of the mean-squared distance, covering inconsistent feasibility situations not tackled by classical projection methods. The analysis does not require the activation of all sets at every iteration, illuminating algorithmic efficiency for large-scale and distributed settings.
Implications and Future Directions
The results substantially broaden the theoretical understanding of stochastic optimization methods for composite convex objectives. They permit unbounded, but integrable, stochasticity and variable sub-gradient growth, crucial for modeling realism in machine learning, signal processing, and variational analysis where exact or deterministic oracles are impractical.
Practically, the findings pave the way for robust, scalable algorithmic designs suitable for decentralized and federated environments, where randomness arises both from data partitioning and function approximation. The theoretical guarantees under weak conditions enhance reliability of iterative methods in noisy, large-scale, or heterogeneous settings.
Theoretically, the work suggests paths forward for relaxing assumptions in other stochastic approximations (e.g., monotone inclusions, operator splitting), and potentially extending the analysis to nonconvex and non-Euclidean domains, such as Riemannian optimization or variational inequalities in Banach spaces. The geometric tools and stochastic Fejér monotonicity concepts may be useful for future analyses involving more complex stochastic dynamics.
Conclusion
This paper rigorously establishes almost sure and mean convergence for the stochastic proximal gradient method under weaker assumptions than previously studied. The work extends the reach of stochastic iterative algorithms for convex optimization, emphasizing practicality, generality, and robustness. The results encompass important applications including classification with mixed losses and inconsistent feasibility, marking a significant advance in stochastic convex optimization theory (2604.13388).