Information-Theoretic Perspectives on Optimizers (2502.20763v1)

Published 28 Feb 2025 in cs.LG

Abstract: The interplay of optimizers and architectures in neural networks is complicated and hard to understand why some optimizers work better on some specific architectures. In this paper, we find that the traditionally used sharpness metric does not fully explain the intricate interplay and introduces information-theoretic metrics called entropy gap to better help analyze. It is found that both sharpness and entropy gap affect the performance, including the optimization dynamic and generalization. We further use information-theoretic tools to understand a recently proposed optimizer called Lion and find ways to improve it.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Information-Theoretic Perspectives on Optimizers (2502.20763v1)

Summary

Related Papers