Learning distinct features helps, provably (2106.06012v3)
Abstract: We study the diversity of the features learned by a two-layer neural network trained with the least squares loss. We measure the diversity by the average $L_2$-distance between the hidden-layer features and theoretically investigate how learning non-redundant distinct features affects the performance of the network. To do so, we derive novel generalization bounds depending on feature diversity based on Rademacher complexity for such networks. Our analysis proves that more distinct features at the network's units within the hidden layer lead to better generalization. We also show how to extend our results to deeper networks and different losses.
- Barron, A.R.: Universal approximation bounds for superpositions of a sigmoidal function. IEEE Transactions on Information theory pp. 930–945 (1993)
- Barron, A.R.: Approximation and estimation bounds for artificial neural networks. Machine Learning pp. 115–133 (1994)
- Sontag, E.D.: VC dimension of neural networks. NATO ASI Series F Computer and Systems Sciences pp. 69–96 (1998)
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.