Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Representation Learning and Recovery in the ReLU Model (1803.04304v1)

Published 12 Mar 2018 in stat.ML, cs.IT, cs.LG, and math.IT

Abstract: Rectified linear units, or ReLUs, have become the preferred activation function for artificial neural networks. In this paper we consider two basic learning problems assuming that the underlying data follow a generative model based on a ReLU-network -- a neural network with ReLU activations. As a primarily theoretical study, we limit ourselves to a single-layer network. The first problem we study corresponds to dictionary-learning in the presence of nonlinearity (modeled by the ReLU functions). Given a set of observation vectors $\mathbf{y}i \in \mathbb{R}d, i =1, 2, \dots , n$, we aim to recover $d\times k$ matrix $A$ and the latent vectors ${\mathbf{c}i} \subset \mathbb{R}k$ under the model $\mathbf{y}i = \mathrm{ReLU}(A\mathbf{c}i +\mathbf{b})$, where $\mathbf{b}\in \mathbb{R}d$ is a random bias. We show that it is possible to recover the column space of $A$ within an error of $O(d)$ (in Frobenius norm) under certain conditions on the probability distribution of $\mathbf{b}$. The second problem we consider is that of robust recovery of the signal in the presence of outliers, i.e., large but sparse noise. In this setting we are interested in recovering the latent vector $\mathbf{c}$ from its noisy nonlinear sketches of the form $\mathbf{v} = \mathrm{ReLU}(A\mathbf{c}) + \mathbf{e}+\mathbf{w}$, where $\mathbf{e} \in \mathbb{R}d$ denotes the outliers with sparsity $s$ and $\mathbf{w} \in \mathbb{R}d$ denote the dense but small noise. This line of work has recently been studied (Soltanolkotabi, 2017) without the presence of outliers. For this problem, we show that a generalized LASSO algorithm is able to recover the signal $\mathbf{c} \in \mathbb{R}k$ within an $\ell_2$ error of $O(\sqrt{\frac{(k+s)\log d}{d}})$ when $A$ is a random Gaussian matrix.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Arya Mazumdar (89 papers)
  2. Ankit Singh Rawat (64 papers)
Citations (6)