DCA: Difference of Convex Functions Algorithm
- DCA is a foundational method that decomposes a nonconvex function into two convex parts to enable iterative minimization.
- It leverages convex surrogates and precise descent properties to guarantee local optimality and robust convergence.
- Accelerated variants like BDCA implement line search techniques, achieving faster convergence rates validated by Ćojasiewicz analysis.
The Difference of Convex Functions Algorithm (DCA) is a foundational framework for the local minimization of functions expressible as the difference of two convex (DC) functions. DCA is particularly central for structured nonconvex optimization in applications requiring precise criticality guarantees, efficient per-iteration majorization, and provable global convergence. The method operates by iteratively constructing and minimizing convex surrogates, leveraging convex analytical properties of the constituent terms. In smooth and strongly convex regimes, DCA admits rigorous descent analysis, sharp convergence rates under various geometric conditions, and serves as a basis for a spectrum of acceleration schemes with practical and theoretical impact (Artacho et al., 2015).
1. Mathematical Formulation and Classical DCA Iteration
Consider the unconstrained minimization of a DC function: where are closed, convex, and at least (in the smooth setting). If is bounded below, one may assume and are strongly convexâthis is always achievable by adding and subtracting a quadratic term. At iteration , the classical DCA forms the affine majorant of at : and solves
0
followed by the update 1. The DCA step direction is 2. The descent property is explicit: one has
3
for 4 denoting the strong convexity modulus (Artacho et al., 2015).
2. Accelerated Variants: Boosted DC Algorithms
The DCA step 5 is a strict descent direction for 6 evaluated at 7: 8 This motivates "Boosted DCA" (BDCA) accelerations that perform a line search along 9 to maximize decrease:
- BDCA-Backtracking initializes 0 and backtracks until
1
- BDCA-Quadratic fits a quadratic model 2 through 3, the directional derivative at 4, and 5, then minimizes this interpolation to select 6.
Both variants then update 7. These BDCA schemes inherit the global convergence properties of DCA but empirically and theoretically exhibit substantially faster rates (Artacho et al., 2015).
3. Convergence Guarantees and Ćojasiewicz Analysis
Let 8 possess the Ćojasiewicz property at every cluster point 9 (automatic for real-analytic 0): 1 for some 2 locally near 3. Under mild regularity (local Lipschitzness, boundedness below), BDCA generates 4 converging monotonically to 5. Moreover:
- 6, a stationary point: 7.
- The convergence rate is determined by 8:
- 9: finite step convergence;
- 0: linear convergence;
- 1: sublinear, with explicit polynomial rates:
2
Proof techniques