Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Convergence Rates Analysis of The Quadratic Penalty Method and Its Applications to Decentralized Distributed Optimization (1711.10802v1)

Published 29 Nov 2017 in math.NA

Abstract: In this paper, we study a variant of the quadratic penalty method for linearly constrained convex problems, which has already been widely used but actually lacks theoretical justification. Namely, the penalty parameter steadily increases and the penalized objective function is minimized inexactly rather than exactly, e.g., with only one step of the proximal gradient descent. For such a variant of the quadratic penalty method, we give counterexamples to show that it may not give a solution to the original constrained problem. By choosing special penalty parameters, we ensure the convergence and further establish the convergence rates of $O\left(\frac{1}{\sqrt{K}}\right)$ for the generally convex problems and $O\left(\frac{1}{K}\right)$ for strongly convex ones, where $K$ is the number of iterations. Furthermore, by adopting Nesterov's extrapolation we show that the convergence rates can be improved to $O\left(\frac{1}{K}\right)$ for the generally convex problems and $O\left(\frac{1}{K2}\right)$ for strongly convex ones. When applied to the decentralized distributed optimization, the penalty methods studied in this paper become the widely used distributed gradient method and the fast distributed gradient method. However, due to the totally different analysis framework, we can improve their $O\left(\frac{\log K}{\sqrt{K}}\right)$ and $O\left(\frac{\log K}{K}\right)$ convergence rates to $O\left(\frac{1}{\sqrt{K}}\right)$ and $O\left(\frac{1}{K}\right)$ with fewer assumptions on the network topology for general convex problems. Using our analysis framework, we also extend the fast distributed gradient method to a communication efficient version, i.e., finding an $\varepsilon$ solution in $O\left(\frac{1}{\varepsilon}\right)$ communications and $O\left(\frac{1}{\varepsilon{2+\delta}}\right)$ computations for the non-smooth problems, where $\delta$ is a small constant.

Summary

We haven't generated a summary for this paper yet.