Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity (2408.13276v3)

Published 20 Aug 2024 in stat.ML, cs.IT, cs.LG, math.IT, math.OC, math.PR, math.ST, and stat.TH

Abstract: For the problem of reconstructing a low-rank matrix from a few linear measurements, two classes of algorithms have been widely studied in the literature: convex approaches based on nuclear norm minimization, and non-convex approaches that use factorized gradient descent. Under certain statistical model assumptions, it is known that nuclear norm minimization recovers the ground truth as soon as the number of samples scales linearly with the number of degrees of freedom of the ground truth. In contrast, while non-convex approaches are computationally less expensive, existing recovery guarantees assume that the number of samples scales at least quadratically with the rank $r$ of the ground-truth matrix. In this paper, we close this gap by showing that the non-convex approaches can be as efficient as nuclear norm minimization in terms of sample complexity. Namely, we consider the problem of reconstructing a positive semidefinite matrix from a few Gaussian measurements. We show that factorized gradient descent with spectral initialization converges to the ground truth with a linear rate as soon as the number of samples scales with $ \Omega (rd\kappa^2)$, where $d$ is the dimension, and $\kappa$ is the condition number of the ground truth matrix. This improves the previous rank-dependence in the sample complexity of non-convex matrix factorization from quadratic to linear. Our proof relies on a probabilistic decoupling argument, where we show that the gradient descent iterates are only weakly dependent on the individual entries of the measurement matrices. We expect that our proof technique is of independent interest for other non-convex problems.

Citations (2)

View on Semantic Scholar

Collections

Sign up for free to add this paper to one or more collections.

Sign Up

Summary

The paper introduces a novel probabilistic proof technique showing that non-convex matrix sensing can achieve a sample complexity of exorpdfstring{$\\Omega(rd\\kappa)\$}{$\Omega(rd\kappa)$}, breaking the prior quadratic rank dependence.
A key novel technique involves constructing auxiliary "virtual sequences" and using a probabilistic decoupling argument to obtain tighter concentration bounds.
This work shows that non-convex methods can be as sample-efficient as convex methods for low-rank matrix recovery, paving the way for more efficient algorithms.

Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity

Summary

The paper "Non-convex matrix sensing: Breaking the quadratic rank barrier in the sample complexity" by Dominik Stoeger and Yizhe Zhu addresses the problem of low-rank matrix recovery using non-convex approaches. Prior approaches relied on convex methods such as nuclear norm minimization or non-convex methods such as factorized gradient descent. The key bottleneck in non-convex methods has been an assumption that the sample complexity scales quadratically with the rank $r$ of the ground truth matrix, while nuclear norm minimization achieves linear scaling with the degrees of freedom of the matrix.

Contribution

This work introduces a novel probabilistic proof technique that shows factorized gradient descent with spectral initialization converges to the ground truth as soon as the sample complexity scales as $\Omega(rd\kappa)$ , where $d$ is the dimension of the matrix and $\kappa$ is the condition number of the ground truth matrix. This represents a significant improvement by closing the gap to a linear rank-dependence on the number of samples. Specifically, the key contributions of the paper include:

Improvement of Rank-Dependence: The authors demonstrate that the sample complexity required for successful recovery using non-convex methods can be as low as $\Omega(rd\kappa)$ , breaking the previous quadratic dependence barrier.
Probabilistic Decoupling Argument: A novel probabilistic decoupling argument is introduced to show that the trajectory of the gradient descent iterates depends weakly on any given generalized entry of the measurement matrices.
Virtual Sequences: The authors construct auxiliary "virtual sequences" that help in establishing stronger concentration bounds beyond existing uniform concentration bounds that rely on the Restricted Isometry Property (RIP).

Detailed Results

Under the assumptions that the measurement operator is Gaussian and the ground truth matrix $X \in \mathbb{R}^{d \times d}$ is symmetric and positive semidefinite, the paper proves:

The spectral initialization $U_0$ can reconstruct the eigendecomposition of $X$ with sufficiently small error.
The gradient descent updates can correct this initial error linearly even when the sample complexity scales as $\Omega(rd\kappa)$ .

Formally, the main result (Theorem 1.2) states: Given $m \geq C rd\kappa^2$ and a step size $\mu$ satisfying $\frac{32 \log(16r)}{6 \sigma_{\min}(X)} \leq \mu \leq \frac{c_1}{\kappa \| X \|}$ , with high probability for all $t \geq 0$ ,

$\text{dist}^2(U_t, U_*) \leq c_2 \left( 1 - \frac{c_3 \mu \sigma_{\min}(X)}{\kappa \| X \|} \right)^t$

where $U_*$ is one feasible decomposition of the matrix $X$ satisfying $X = U_* U_*^\top$ .

Novel Proof Techniques

A notable advance includes the construction of virtual sequences that decouple the dependence of the trajectory on individual measurement entries, enabling tighter concentration bounds. The authors contrast their technique with existing approaches and show that earlier uniform concentration bounds necessitate quadratic rank-dependence due to inevitable dependency on RIP.

Implications and Future Directions

The implications of this work are significant for both theory and practice. From a practical perspective, the reduction in the sample complexity required for accurate recovery of low-rank matrices can lead to more efficient algorithms in diverse applications, including machine learning, signal processing, and statistical data analysis.

Theoretically, this work opens several avenues:

Extension to Non-Gaussian Measurements: Adapting the virtual sequence method for measurement operators that deviate from Gaussian distributions, such as those in matrix completion scenarios.
Condition Number Independence: Investigating if the dependence on the condition number $\kappa$ in the sample complexity can be eliminated, which would align the performance of non-convex methods more closely with convex optimization approaches.
Broader Applicability: Applying the probabilistic decoupling and virtual sequence techniques to other non-convex optimization problems beyond matrix sensing, potentially improving sample complexity bounds in a variety of settings.

In conclusion, the work by Stoeger and Zhu provides compelling evidence that non-convex approaches in matrix reconstruction are competitive in terms of sample efficiency when paired with innovative proof techniques. This advancement bridges a crucial gap in the literature and sets the stage for further exploration and refinement in the domain of non-convex optimization.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (2)

Tweets

https://twitter.com/mahdisoltanol/status/1828456044613513281