Papers
Topics
Authors
Recent
2000 character limit reached

Generalized Accelerated Primal-Dual (GAPD) Algorithm

Updated 16 October 2025
  • The GAPD algorithm is a primal–dual method for solving convex–concave saddle-point problems using momentum terms and Bregman distances.
  • It leverages two-sided quadratic growth conditions (QFG/QGG) to guarantee linear convergence without requiring strong convexity–concavity.
  • GAPD unifies and extends methods like OGDA, APD, and Mirror-Prox, making it applicable to robust optimization, constrained learning, and large-scale games.

The Generalized Accelerated Primal–Dual (GAPD) algorithm refers to a class of optimization methods designed to solve convex–concave saddle-point problems, minₓ∈X maxᵧ∈Y f(x, y), with improved convergence guarantees under relaxed growth assumptions. GAPD algorithms extend classical primal–dual and accelerated primal–dual updates by incorporating momentum terms and Bregman distances, and can attain linear convergence rates even in the absence of strong convexity–concavity. They adaptively blend gradient information, allow for non-Euclidean geometries, and subsume many previous schemes as special cases. The methods are applicable to a broad range of structured saddle-point problems, including those in robust optimization, constrained learning, and large-scale games.

1. Formulation and Algorithmic Structure

The GAPD algorithm is designed for problems of the form

minxXmaxyY f(x,y),\min_{x\in X} \max_{y\in Y} ~ f(x, y),

where f is convex in x for fixed y and concave in y for fixed x. At each iteration, momentum terms are constructed for both primal (x) and dual (y) variables, and updates are performed using Bregman distances D_X and D_Y, allowing for flexible (possibly non-Euclidean) geometry:

  • Compute gradient differences (momentum increments):
    • Dual: qky=yf(xk,yk)yf(xk1,yk1)q_k^y = \nabla_y f(x_k, y_k) - \nabla_y f(x_{k-1}, y_{k-1})
    • Primal: qkx=xf(xk,yk)xf(xk1,yk1)q_k^x = \nabla_x f(x_k, y_k) - \nabla_x f(x_{k-1}, y_{k-1})
  • Update y via (generalized) Bregman proximal step:

yk+1=arg minyY{yf(xk,yk)+αkqky,y+1σkDY(y,yk)}y_{k+1} = \argmin_{y\in Y} \left\{-\langle \nabla_y f(x_k, y_k) + \alpha_k q_k^y, y\rangle + \frac{1}{\sigma_k} D_Y(y, y_k)\right\}

  • Form an aggregate/momentum gradient for x:

sk=θkxf(xk,yk+1)+(1θk)xf(xk,yk)+βkqkxs_k = \theta_k \nabla_x f(x_k, y_{k+1}) + (1-\theta_k)\nabla_x f(x_k, y_k) + \beta_k q_k^x

  • Update x via (generalized) Bregman step:

xk+1=arg minxX{sk,x+1τkDX(x,xk)}x_{k+1} = \argmin_{x\in X} \left\{ \langle s_k, x \rangle + \frac{1}{\tau_k} D_X(x, x_k) \right\}

The parameter choices θk\theta_k, αk\alpha_k, and βk\beta_k provide flexibility and allow unification of several methods: with θk=0\theta_k=0 GAPD reduces to the optimistic gradient descent–ascent method (OGDA), and with θk=1\theta_k=1 and βk=0\beta_k=0 it recovers the accelerated primal–dual (APD) algorithm (Melcher et al., 13 Oct 2025).

2. Two-Sided Quadratic Functional and Gradient Growth

Instead of requiring strong convexity–concavity, GAPD exploits two-sided quadratic growth properties, significantly relaxing classical assumptions.

F(z)F(zˉ),zzˉ2DZM(z,zˉ)\langle F(z) - F(\bar{z}), z - \bar{z}\rangle \geq 2 D_Z^M(z, \bar{z})

where F(z)=[xf(x,y),yf(x,y)]TF(z) = [\nabla_x f(x, y), -\nabla_y f(x, y)]^T and DZMD_Z^M is a Bregman distance (possibly weighted).

f(x,yˉ)f(xˉ,y)DZM(z,zˉ)f(x, \bar{y}) - f(\bar{x}, y) \geq D_Z^M(z, \bar{z})

These conditions guarantee that the objective or the gradient mapping grows at least quadratically as one moves away from any saddle-point solution—measured against the Bregman distance. Thus, global strong convexity–concavity is not required for linear convergence of the iterates; it suffices that the function or its gradient satisfies the two-sided QFG or QGG locally or globally (Melcher et al., 13 Oct 2025).

3. Convergence Theory

The GAPD algorithm enjoys a rigorous convergence proof framework based on one-step descent lemmas and telescoping potential functions.

  • The central result is that, under the two-sided QFG or QGG conditions and with step sizes chosen per specified rules, the sequence {zk}\{z_k\} generated by GAPD satisfies

DZAKΓBK(zˉK,zK)t0tKDZA0(zˉ0,z0)D_Z^{A_K - \Gamma B_K}(\bar{z}_K, z_K) \leq \frac{t_0}{t_K} D_Z^{A_0}(\bar{z}_0, z_0)

where DZAkD_Z^{A_k} are block-diagonal weighted Bregman distances, tkt_k is a (multiplicative) sequence of contraction factors, and zˉk\bar{z}_k is the projection onto the saddle-point set at iteration kk.

  • The contraction of tkt_k with kk is geometric, yielding a linear rate:

DZAKΓBK(z,zK)CρKD_Z^{A_K-\Gamma B_K}(z^*, z_K) \leq C \rho^K

for constants C>0C>0, 0<ρ<10<\rho<1. This extends much of the prior linear-rate theory, as strong convexity/conavity is not strictly necessary.

  • The analysis handles Bregman distances (not only squares of norms), enabling non-Euclidean settings.

4. Connections with Existing Methods

GAPD is a strict generalization and unification of a range of existing algorithms:

  • OGDA: Setting θk=0\theta_k=0 reduces GAPD to optimistic gradient descent–ascent.
  • APD: Setting θk=1,βk=0\theta_k=1,\, \beta_k=0 yields the Chen–Ouyang–Lan APD algorithm.
  • Mirror-Prox: For appropriate choice of Bregman generators, the update includes extragradient or mirror–prox techniques.
  • The methodology accommodates both block–coordinate methods and full composite steps, depending on the block structure of (x,y)(x,\, y) and the Bregman generators.
Method θk\theta_k βk\beta_k Growth Required
GAPD arbitrary arbitrary Two-sided QFG/QGG
OGDA $0$ arbitrary Weak monotonicity (possibly QGG)
APD $1$ $0$ Strong convexity–concavity or QFG/QGG

GAPD thereby subsumes and extends most of the well-known methods as special cases.

5. Structured Problem Classes and Applications

The practicality of GAPD is established by demonstrating its applicability to structured saddle-point problems beyond those satisfying strong convexity–concavity. A principal example is

minxXmaxyY h(C1x)+Ax,yg(C2y)\min_{x\in X} \max_{y\in Y} ~ h(C_1 x) + \langle A x, y \rangle - g(C_2 y)

where h,gh,g are strongly convex functions and C1,C2,AC_1,C_2,A are matrices. Suitable spectral "domination" conditions and error bounds (based on properties such as the Hoffman constant) ensure that these problems satisfy two-sided QFG/QGG (Melcher et al., 13 Oct 2025). Such examples include constrained linear–quadratic games, multi-stage resource allocation, and robust learning formulations.

6. Algorithmic Parameters and Implementation Details

  • Momentum and Extrapolation: The parameters θk\theta_k, αk\alpha_k, βk\beta_k control the momentum and balance between current and previous gradients. Proper choices ensure the descent property and geometric convergence.
  • Bregman Geometry: The method accommodates the use of Bregman distances (generated by strictly convex ϕ\phi), enabling coordinate–friendly updates and non-Euclidean regularization.
  • Block Structure: If (x,y)(x, y) comprises blocks, block-wise momentum and Bregman distances can be employed.
  • Initialization and Step Sizes: The linear convergence proof relies on calculated initial step–sizes and weight matrices that leverage problem conditioning and QFG/QGG constants.

7. Impact and Extensions

GAPD broadens the class of saddle–point problems for which provably fast primal–dual algorithms with linear convergence exist. Key implications:

  • Many practical problems in machine learning and robust optimization, which are not globally strongly convex–concave but satisfy quadratic growth locally, become amenable to fast primal–dual solution with GAPD.
  • Using Bregman geometry enables adaptation to problem structure, further accelerating convergence.
  • The parameter settings and analytic machinery clarify the implicit trade-offs between extrapolation, regularization, and convergence speed, providing guidance for real-world implementation.

In summary, the GAPD framework delivers linear convergence for a generalized class of saddle-point problems under relaxed but precisely characterized functional or gradient growth conditions. It unifies and extends the accelerated primal–dual methodology, supporting broad application domains and advancing the state of the art in saddle-point optimization (Melcher et al., 13 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Generalized Accelerated Primal-Dual (GAPD) Algorithm.