Learning One-hidden-layer ReLU Networks via Gradient Descent (1806.07808v1)

Published 20 Jun 2018 in stat.ML and cs.LG

Abstract: We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.

Authors (4)

Xiao Zhang (435 papers)
Yaodong Yu (39 papers)
Lingxiao Wang (74 papers)
Quanquan Gu (198 papers)

Citations (134)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Learning One-hidden-layer ReLU Networks via Gradient Descent (1806.07808v1)

Summary

Related Papers