Difference-of-Convex Algorithms (DCA)
- DCA is an optimization method that expresses a nonconvex function as the difference of two convex functions, enabling systematic descent and convergence guarantees.
- Advanced DCA variants, such as boosted, inertial, and stochastic methods, enhance performance and accelerate convergence through techniques like line-search and momentum.
- DCA’s geometric and continuous-time perspectives offer rigorous analysis and broad applicability in fields such as machine learning, signal processing, and deep learning.
A Difference-of-Convex Algorithm (DCA) is an iterative optimization framework for nonconvex problems where the objective is expressed as the difference of two convex functions. The DCA has established itself as a central methodology for broad classes of structured nonconvex, nonsmooth, and composite programs, offering a unifying perspective for analyzing convergence, complexity, and algorithmic variants.
1. Problem Structure and Classical DCA Iteration
A typical DC program involves minimizing a function of the form
where are proper closed convex functions (Niu, 2022, Artacho et al., 2015). DCA uses an iterative scheme that at each iteration linearizes the concave part () at the current point and solves a resulting convex surrogate: where . This subproblem remains convex under very mild conditions and, when solved exactly, yields global descent in the original nonconvex objective.
The basic properties are as follows:
- The sequence is nonincreasing.
- Any cluster point is a critical point, i.e., (Niu, 2022).
- When are strongly convex, the differences vanish asymptotically.
2. Convergence Theory and Rates
DCA convergence is underpinned by sufficient conditions leveraging strong convexity, subgradient inequalities, and the Kurdyka-Łojasiewicz (KL) property. Suppose and 0 are strongly convex so that 1:
- The sufficient decrease property holds: 2, and the tail sum 3 (Niu, 2022).
- Under the classical KL property on 4, global convergence of the entire sequence is obtained. The actual rate is dictated by the desingularizing function in the KL inequality:
- Linear convergence if the KL exponent 5;
- Sublinear if 6;
- Finite-step convergence if 7 (Niu, 2022, You et al., 2021).
- Explicit non-asymptotic rates have been established in tight O(1/N) or O(1/√N) form for the residual norm under additional curvature or smoothness assumptions (Abbaszadehpeivasti et al., 2021, Rotaru et al., 2024, Rotaru et al., 6 Mar 2025).
- DCA achieves global linear convergence under an extended Polyak–Łojasiewicz condition, even for some bounded constrained problems (Yao et al., 2023).
3. Algorithmic Variants and Acceleration
Numerous DCA variants have been developed for distinct problem structures:
a) Boosted DCA (BDCA):
- Enhances classical DCA by identifying that the DCA direction is a descent direction; it applies an Armijo-type line search to extrapolate further, dramatically increasing empirical convergence speed while retaining global convergence guarantees, and, for quadratic objectives, yielding R-linear convergence (Artacho et al., 2015, Artacho et al., 2019, Artacho et al., 2019, Abbaszadehpeivasti et al., 18 Oct 2025).
b) Inertial and Momentum DCA:
- InDCA and its refined version (RInDCA) incorporate heavy-ball or Nesterov-type inertial forces, extending the allowable inertial range by exploiting both convex components. These methods demonstrate significantly reduced iteration counts and CPU time in both synthetic and application-driven scenarios (You et al., 2021, Thi et al., 2018).
c) DCA with Extrapolation:
- Proximal DCA with extrapolation (pDCAe) uses FISTA-like Nesterov acceleration on proximal DCA, reducing iteration counts by factors of up to 3 or more while maintaining full convergence guarantees under the KL property (Wen et al., 2016).
d) Stochastic and Variance-Reduced DCA:
- Stochastic DCAs (SDCAs) extend the framework to expectations of DC functions, often coupled with variance reduction (e.g., PAGE), achieving optimal sample complexity 8 for the gradient computation under finite-sum settings and seamless extension to online/stochastic settings (Nguyen et al., 15 Sep 2025, An et al., 2019).
e) Contractive DCA (cDCA):
- Recognizes the proximal subproblem as a contraction mapping, proposing adaptive Picard iteration and termination rules, leading to practical reductions in total fixed-point iterations and CPU time (He et al., 16 May 2025).
4. Stationarity and Solution Types
DCA is fundamentally a critical-point method, and its standard asymptotic guarantee is that every limit (cluster) point 9 satisfies the DC criticality condition: 0 However, critical points may not be local minima. To address this, d-stationarity is operationalized, especially in nonsmooth settings:
- A point 1 is d-stationary if 2 for all directions 3, i.e., directional derivatives are nonnegative in all directions (Artacho et al., 2019, Feng et al., 5 Jan 2026).
- BDCA-DFO and perturbed DCA (pDCA) schemes ensure that limit points are d-stationary almost surely. The latter achieves this using a vanishing random perturbation of the linearization point at each iteration (Artacho et al., 2019, Feng et al., 5 Jan 2026).
5. Constraint Handling and Extensions
DCA is naturally extensible to constrained settings and alternative spaces:
- Linear constraints are handled via inclusion in the convex part 4 (often as an indicator function), and the resulting subproblem remains convex. Enhanced BDCA with linear constraints achieves fast R-linear convergence in quadratic cases (Artacho et al., 2019).
- Riemannian DCA generalizes the entire method to geodesically convex analysis on Hadamard manifolds. The subproblems involve exponential maps and Riemannian subdifferentials, admitting analogous convergence theorems (Bergmann et al., 2021).
- In Hilbert spaces, inexact and adaptive DCA frameworks (I-ADCA) allow for inexact subgradients and subproblem solves while preserving convergence, with direct application to PDE-constrained optimal control problems with DC-regularized objectives (Khanh et al., 10 Jan 2026).
6. Continuous-Time and Geometric Perspectives
Recent work connects DCA to geometric and continuous-time dynamical systems:
- Classical DCA is the explicit Euler discretization (with step size one) of a nonlinear ODE in dual coordinates (5), where 6.
- As the relaxation parameter 7, a damped DCA converges to the Hessian-Riemannian gradient flow
8
and yields global convergence, KL property-based convergence rates, and a strict energy identity (Niu, 8 Apr 2026).
- The speed and geometry of DCA depend markedly on the choice of DC decomposition; the convex part 9 induces the Riemannian metric in which descent occurs, providing a decomposition-quality criterion: ideally, the metric aligns with the Hessian of 0 near a local minimum.
7. Numerical Performance and Application Domains
DCA and its variants have demonstrated effectiveness on a diverse range of nonconvex problems:
- Quadratic and log-determinant programs in information theory, with global linear convergence assured by an extended DC-Polyak–Łojasiewicz inequality (Yao et al., 2023).
- Sparse and nonconvex-regularized regression, minimum sum-of-squares clustering, and combinatorial clustering, achieving significant speedup with line-search or momentum-based acceleration (Artacho et al., 2015, Artacho et al., 2019, Wen et al., 2016).
- Image denoising via nonconvex total variation, matrix copositivity, and high-dimensional data visualization (t-SNE embedding), with inertial and extrapolative procedures reducing wall-clock time and iterations (You et al., 2021, Thi et al., 2018).
- Deep learning, providing a framework to understand shortcut architectures: standard optimizers like SGD and PPA arise as special instances of DCA with particular DC decompositions, and even complicated architectures (ResNet, NegNet) have algorithms interpretable via DCA surrogates (Sun et al., 2024).
The effectiveness is further amplified by algorithmic flexibility: adaptive majorization, inertial/momentum strategies, line search, stochastic and distributed implementations, and the fusion of variance reduction or Bregman regularization.
In summary, Difference-of-Convex Algorithms constitute a fundamental and extensible toolkit for nonconvex optimization, encompassing rigorous global and local convergence guarantees, rapid practical acceleration via boosting or inertia, broad applicability to nonsmooth, stochastic, structured, and constrained settings, and deep geometric and dynamical underpinnings. Their analysis involves an overview of curvature-based performance estimation, KL theory, and geometric flow, yielding a mature and flexible theory well-suited to modern nonconvex optimization (Niu, 2022, Abbaszadehpeivasti et al., 2021, Rotaru et al., 6 Mar 2025, Sun et al., 2024, Niu, 8 Apr 2026).