Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Compression Principle and Bayesian Optimization for Neural Networks (2006.12714v1)

Published 23 Jun 2020 in cs.LG and stat.ML

Abstract: Finding methods for making generalizable predictions is a fundamental problem of machine learning. By looking into similarities between the prediction problem for unknown data and the lossless compression we have found an approach that gives a solution. In this paper we propose a compression principle that states that an optimal predictive model is the one that minimizes a total compressed message length of all data and model definition while guarantees decodability. Following the compression principle we use Bayesian approach to build probabilistic models of data and network definitions. A method to approximate Bayesian integrals using a sequence of variational approximations is implemented as an optimizer for hyper-parameters: Bayesian Stochastic Gradient Descent (BSGD). Training with BSGD is completely defined by setting only three parameters: number of epochs, the size of the dataset and the size of the minibatch, which define a learning rate and a number of iterations. We show that dropout can be used for a continuous dimensionality reduction that allows to find optimal network dimensions as required by the compression principle.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com