Papers
Topics
Authors
Recent
Search
2000 character limit reached

Softmax Is Not an Artificial Trick: An Information-Theoretic View of Softmax in Neural Networks

Published 7 Oct 2019 in cs.LG, cs.CV, and stat.ML | (1910.02629v3)

Abstract: Despite great popularity of applying softmax to map the non-normalised outputs of a neural network to a probability distribution over predicting classes, this normalised exponential transformation still seems to be artificial. A theoretic framework that incorporates softmax as an intrinsic component is still lacking. In this paper, we view neural networks embedding softmax from an information-theoretic perspective. Under this view, we can naturally and mathematically derive log-softmax as an inherent component in a neural network for evaluating the conditional mutual information between network output vectors and labels given an input datum. We show that training deterministic neural networks through maximising log-softmax is equivalent to enlarging the conditional mutual information, i.e., feeding label information into network outputs. We also generalise our informative-theoretic perspective to neural networks with stochasticity and derive information upper and lower bounds of log-softmax. In theory, such an information-theoretic view offers rationality support for embedding softmax in neural networks; in practice, we eventually demonstrate a computer vision application example of how to employ our information-theoretic view to filter out targeted objects on images.

Citations (2)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.