PS-DCA: Proximal-Subgradient DC Algorithm

Updated 23 October 2025

PS-DCA is a hybrid optimization method combining proximal subgradient and one-step DC refinement, designed for fractional minimization of convex functions.
The algorithm strategically integrates a proximal subgradient step with a DC refinement to robustly navigate nonconvex landscapes and enforce unit-sphere constraints.
Empirical results show PS-DCA attains lower objective values and fewer iterations, effectively escaping local minima in spectral graph applications.

The Proximal-Subgradient-Difference of Convex Functions Algorithm (PS-DCA) is an iterative scheme for nonsmooth nonconvex optimization, specifically adapted to problems where the objective is a fractional or composite function expressible as the ratio (or difference) of convex, positively homogeneous functions. PS-DCA systematically integrates proximal subgradient steps with a refinement via a single step of the Difference-of-Convex (DC) Algorithm, yielding a hybrid iteration with improved global convergence properties and greater robustness against initial point sensitivity as compared to standard proximal-gradient approaches (Qi et al., 22 Oct 2025).

1. Algorithmic Structure and Problem Setting

PS-DCA is specifically formulated for single-ratio fractional minimization problems of the form

$\min_{x \in \mathcal{S} \cap \Omega} E(x) = \frac{T(x)}{B(x)},$

where $T: X \to \mathbb{R}$ is convex and positive-homogeneous, $B: X \to \mathbb{R}$ is convex, absolutely homogeneous, $\mathcal{S}$ is the unit sphere, and $\Omega = \{x \mid B(x) \neq 0\}$ (Qi et al., 22 Oct 2025). The algorithm exploits the equivalence between minimizing $T(x)/B(x)$ over $x \in X \setminus \{0\}$ and the constrained problem $\min T(x)$ subject to $B(x) = 1$ . This "lifting" allows for the design of first-order algorithms that operate efficiently on high-dimensional signal processing problems, such as the computation of generalized graph Fourier modes.

The canonical PS-DCA iteration comprises two main steps:

Proximal-Subgradient Step: Given an iterate $x_k$ , compute a candidate

$l_k = \operatorname{prox}_{\lambda_k T}(x_k + \lambda_k E(x_k) v_k),$

where $v_k \in \partial B(x_k)$ is a subgradient and $E(x_k) = T(x_k)/B(x_k)$ . This step approximately solves a regularized subproblem involving the current linearization of $E(x)$ .

One-Step DCA Refinement: Improve the candidate $l_k$ by performing one iteration of a DCA applied to the DC decomposition $T(x) - E(l_k) B(x)$ (plus regularization). Explicitly,

$t_k = \operatorname{prox}_{T/\rho_k} \left( (E(l_k)/\rho_k) w_k \right),$

where $w_k \in \partial B(0)$ is adaptively chosen and $\rho_k > 0$ is a regularization parameter.

The final update $x_{k+1}$ is projected onto the sphere: $x_{k+1} = \frac{y_k}{\|y_k\|}, \qquad \text{where %%%%17%%%% if DCA step accepted, else %%%%18%%%%}.$

2. Mathematical Foundations and Optimality

The underlying optimality and convergence theory is rooted in DC programming and classical subdifferential calculus. For the fractional program, first-order stationarity at $x^*$ is characterized by the existence of $v^* \in \partial B(x^*)$ such that

$x^* = \operatorname{prox}_{\lambda T}(x^* + \lambda E(x^*) v^*).$

Such fixed-point characterization is a direct first-order optimality condition (Qi et al., 22 Oct 2025).

A point $x^*$ is a critical point if $E(x^*) \partial B(x^*) \cap \partial T(x^*) \neq \varnothing$ . In particular, if $E(x^*) \partial B(0) \subseteq \partial T(0)$ , $x^*$ is a global minimizer. The refinement via the DCA step targets this criticality condition more directly compared to pure proximal-gradient approaches.

The DC refinement decomposes the regularized subproblem via

$g_k(x) = T(x) + \frac{1}{2\lambda_k}\|x\|^2, \qquad h_k(x) = E(x_k)B(x) + \frac{1}{2\lambda_k}\|x\|^2,$

and uses a one-step DCA iteration.

3. Convergence Properties

Under the assumptions that the step size sequence $\{\lambda_k\}$ and regularization parameters $\{\rho_k\}$ are bounded (both above and away from zero), and that $T$ and $B$ are convex and (positive/absolutely) homogeneous, the algorithm satisfies the following properties (Qi et al., 22 Oct 2025):

Monotonic Descent: $E(x_k)$ decreases strictly unless the iterate sequence has stabilized. Specifically,

$E(x_k) \geq E(x_{k+1}) + \frac{1}{\lambda_k B(l_k)}\|x_k - l_k\|^2,$

with an additional strict decrease term if the DCA refinement is active.

Compactness and Stalling: The sequence $\{x_k\}$ remains on the unit sphere (thus bounded) and

$\|x_k - x_{k+1}\| \to 0.$

Subsequential Convergence: Every accumulation point $x^*$ of $\{x_k\}$ is a critical point, i.e., it satisfies the first-order condition above. Unless the set of cluster points is a nontrivial continuum, convergence is to a single point.

These monotonicity and stationarity conditions are robust to initialization and are maintained even when the DCA step is omitted (yielding the proximal-subgradient algorithm, PSA).

4. Relation to Proximal and DC Algorithms

The PS-DCA algorithm generalizes elements of proximal subgradient splitting (Cruz, 2014), leveraging splitting techniques where both T and B can be nonsmooth and only accessible via subgradient/proximal oracles. The DC refinement step is akin to a single iteration of the classical DCA (Banert et al., 2016), but tightly integrated within each main algorithmic loop. This fusion enables PS-DCA to better navigate nonconvex landscapes and avoid the pronounced local minima trapping observed for standard proximal-gradient schemes.

From an optimality perspective, PS-DCA's convergence theory and update structure inherit ideas from more general inexact DC frameworks (Zhang et al., 2023) and recent convergence rate analyses for DC-type splitting (Rotaru et al., 6 Mar 2025, Rotaru et al., 2 Jun 2025, Abbaszadehpeivasti et al., 18 Oct 2025), though (Qi et al., 22 Oct 2025) does not develop explicit complexity rates.

5. Numerical Experiments and Empirical Observations

Extensive experiments on the computation of generalized graph Fourier modes (GFMs), a canonical application area for fractional programs with DC structure, demonstrate the practical efficiency of PS-DCA (Qi et al., 22 Oct 2025). Compared to proximal-gradient (PGSA) and manifold proximal gradient algorithms (ManPG-Ada), PS-DCA consistently attains lower objective values, converges in fewer iterations, and is markedly less sensitive to initialization.

Key points in the design of the numerical studies include:

Graph types: community graphs, random geometric graphs (RGG), directed RGGs.
Fractional objective expresses directional variation over normalization, with normalization constraints enforced via projection or subspace orthogonality.
Empirical results show that the DCA step can escape low-quality local minima that trap proximal-gradient-only methods, yielding globally superior solutions for a fixed computational budget.

A summary table (abridged for clarity):

Method	Avg. Obj. Value	Avg. Iter.	CPU Time (s)
PS-DCA (Q=I)	Lower	Fewer	Lower
PGSA (Q=I)	Higher	More	Higher
ManPG-Ada (Q=I)	Higher	More	Higher

6. Applications and Implications

The principal application discussed is in spectral graph theory, specifically, the computation of generalized graph Fourier modes (including Laplacian eigenmaps, directed total variation, and sparse variation models). The PS-DCA framework applies broadly to signal processing, clustering, and graph-based dictionary learning, wherever fractional DC optimization arises over (sub)manifold constraints or normalization sets.

By incorporating an explicit DC step within a proximal-subgradient backbone, PS-DCA allows robust handling of:

Nonsmooth variation measures (including l₁, total variation-type objectives).
Hard normalization constraints, including manifolds.
Cases where starting points are far from the global minima.

7. Significance and Future Directions

PS-DCA unifies classical proximal splitting with DC decomposition, producing an algorithm less prone to initialization sensitivity and local-minimum stagnation than pure gradient-based competitors. The nuanced use of a “one-step DCA” within each iteration distinguishes it from typical first-order solvers and draws on a rich literature in DC programming and proximal methods (An et al., 2015, Wen et al., 2016, Abbaszadehpeivasti et al., 18 Oct 2025).

Open directions include explicit complexity rate analysis (e.g., under Polyak–Łojasiewicz or Kurdyka–Łojasiewicz conditions), extension to more general constraint sets, and adaptive integration of inexact or stochastic subgradient/proximal oracles in large-scale nonconvex machine learning settings.

Key References for Further Study:

(Qi et al., 22 Oct 2025): Proximal-Subgradient-DC algorithm and fractional programming
(Cruz, 2014): Related splitting and subgradient methods
(An et al., 2015, Wen et al., 2016, Rotaru et al., 6 Mar 2025, Abbaszadehpeivasti et al., 18 Oct 2025): DCA, composite splitting, and convergence analysis in DC optimization