Papers
Topics
Authors
Recent
Search
2000 character limit reached

Maximum Multiscale Entropy and Neural Network Regularization

Published 25 Jun 2020 in cs.LG, cs.IT, math.IT, nlin.AO, and stat.ML | (2006.14614v1)

Abstract: A well-known result across information theory, machine learning, and statistical physics shows that the maximum entropy distribution under a mean constraint has an exponential form called the Gibbs-Boltzmann distribution. This is used for instance in density estimation or to achieve excess risk bounds derived from single-scale entropy regularizers (Xu-Raginsky '17). This paper investigates a generalization of these results to a multiscale setting. We present different ways of generalizing the maximum entropy result by incorporating the notion of scale. For different entropies and arbitrary scale transformations, it is shown that the distribution maximizing a multiscale entropy is characterized by a procedure which has an analogy to the renormalization group procedure in statistical physics. For the case of decimation transformation, it is further shown that this distribution is Gaussian whenever the optimal single-scale distribution is Gaussian. This is then applied to neural networks, and it is shown that in a teacher-student scenario, the multiscale Gibbs posterior can achieve a smaller excess risk than the single-scale Gibbs posterior.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.