Gradient Descent with Random Initialization: Fast Global Convergence for Nonconvex Phase Retrieval (1803.07726v2)

Published 21 Mar 2018 in stat.ML, cs.IT, cs.LG, cs.NA, math.IT, and math.OC

Abstract: This paper considers the problem of solving systems of quadratic equations, namely, recovering an object of interest $\mathbf{x}^{{\natural}\in\mathbb{R}^{n}$} from $m$ quadratic equations/samples $y_{i}=(\mathbf{a}_{i}^{{\top}\mathbf{x}^{{\natural})^{2}$,}} $1\leq i\leq m$. This problem, also dubbed as phase retrieval, spans multiple domains including physical sciences and machine learning. We investigate the efficiency of gradient descent (or Wirtinger flow) designed for the nonconvex least squares problem. We prove that under Gaussian designs, gradient descent --- when randomly initialized --- yields an $\epsilon$-accurate solution in $O\big(\log n+\log(1/\epsilon)\big)$ iterations given nearly minimal samples, thus achieving near-optimal computational and sample complexities at once. This provides the first global convergence guarantee concerning vanilla gradient descent for phase retrieval, without the need of (i) carefully-designed initialization, (ii) sample splitting, or (iii) sophisticated saddle-point escaping schemes. All of these are achieved by exploiting the statistical models in analyzing optimization algorithms, via a leave-one-out approach that enables the decoupling of certain statistical dependency between the gradient descent iterates and the data.

Citations (228)

View on Semantic Scholar

Summary

The paper demonstrates that vanilla gradient descent with random initialization converges in O(log n + log(1/ε)) iterations for phase retrieval under Gaussian designs.
It introduces a two-stage analysis where the first O(log n) iterations approximate the true solution and the following phase attains linear error reduction.
The results provide theoretical validation for employing simple, memory-efficient gradient descent over more complex methods in nonconvex optimization.

Understanding Global Convergence in Nonconvex Phase Retrieval via Gradient Descent

In this paper, the authors explore the problem of nonconvex phase retrieval through systems of quadratic equations. This task involves reconstructing an unknown vector $\bm{x}^{\natural} \in \mathbb{R}^n$ from $m$ samples of the form $y_{i} = (\bm{a}_{i}^{\top}\bm{x}^{\natural})^2$ , where the vectors $\bm{a}_i$ are known. This formulation, with applications across physical sciences and machine learning, leads to a nonconvex optimization challenge often solved using gradient descent.

Key Findings

The authors establish results focusing on the efficacy of gradient descent for solving nonconvex phase retrieval problems under Gaussian design assumptions. They demonstrate that when gradient descent is initialized randomly, it converges to an $\epsilon$ -accurate solution in $O(\log n + \log(1/\epsilon))$ iterations given nearly minimal sample conditions $m \gtrsim n \,\text{poly}\log(m)$ . These findings provide the first global convergence guarantees of vanilla gradient descent for phase retrieval, even without specialized initialization, sample splitting, or sophisticated techniques to escape saddle points.

Detailed Insights

The analysis identifies two stages of the gradient descent dynamics:

Stage 1: The algorithm progresses for $O(\log n)$ iterations, which are crucial for entering a local region around the true solution.
Stage 2: In this phase, the algorithm demonstrates linear convergence, achieving rapid error reduction.

Significantly, convergence is achieved without the need for prior techniques such as spectral initialization. The work shows that random initialization, often used in practical applications for its simplicity and robustness, can be theoretically justified for phase retrieval.

Additionally, the authors explore the trajectory taken by the algorithm, emphasizing its path through the solution space, where saddle points are not a concern. This feature is particularly noteworthy as it removes the need for perturbed optimization methods typically employed to escape saddle regions in nonconvex landscapes.

Implications and Theoretical Contributions

The results reiterate the potential of leveraging statistical insights into optimization algorithms and the inherent statistical models to reinforce theoretical guarantees. The paper's findings highlight a significant point: in specific settings characterized by suitable geometric and statistical conditions, simplistic and memory-efficient algorithms like vanilla gradient descent can prove to be formidable contenders against more complex methods traditionally considered essential for nonconvex recovery tasks.

In the broader context of machine learning and signal processing, these findings pave the way for efficient, scalable solutions that reinforce the potential of nonconvex models where convex relaxation techniques traditionally dominate.

Future Potential

Looking ahead, the integration of statistical models and optimization strategies presents opportunities for further exploration beyond phase retrieval. This framework may extend to other machine learning paradigms, such as mixed linear regression and neural networks with quadratic activation functions, as proposed in the authors' discussion. The pursuit of understanding and optimizing these models using gradient-based methods could prove vital for broader applications, including but not limited to computer vision and data science.

The paper heralds a refined approach to nonconvex optimization, advocating for simplicity coupled with effective theoretical guarantees, which is perhaps crucial as the field continues to venture into increasingly complex data-driven challenges.

PDF Markdown