Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Assessing Neural Network Representations During Training Using Noise-Resilient Diffusion Spectral Entropy (2312.04823v1)

Published 4 Dec 2023 in cs.CV, cs.AI, cs.IT, cs.LG, and math.IT

Abstract: Entropy and mutual information in neural networks provide rich information on the learning process, but they have proven difficult to compute reliably in high dimensions. Indeed, in noisy and high-dimensional data, traditional estimates in ambient dimensions approach a fixed entropy and are prohibitively hard to compute. To address these issues, we leverage data geometry to access the underlying manifold and reliably compute these information-theoretic measures. Specifically, we define diffusion spectral entropy (DSE) in neural representations of a dataset as well as diffusion spectral mutual information (DSMI) between different variables representing data. First, we show that they form noise-resistant measures of intrinsic dimensionality and relationship strength in high-dimensional simulated data that outperform classic Shannon entropy, nonparametric estimation, and mutual information neural estimation (MINE). We then study the evolution of representations in classification networks with supervised learning, self-supervision, or overfitting. We observe that (1) DSE of neural representations increases during training; (2) DSMI with the class label increases during generalizable learning but stays stagnant during overfitting; (3) DSMI with the input signal shows differing trends: on MNIST it increases, while on CIFAR-10 and STL-10 it decreases. Finally, we show that DSE can be used to guide better network initialization and that DSMI can be used to predict downstream classification accuracy across 962 models on ImageNet. The official implementation is available at https://github.com/ChenLiu-1996/DiffusionSpectralEntropy.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (29)
  1. Mutual information neural estimation. In International conference on machine learning, pages 531–540. PMLR.
  2. Enhancing experimental signals in single-cell rna-sequencing data using graph signal processing. Nature Biotechnology.
  3. A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR.
  4. Diffusion maps. Applied and computational harmonic analysis, 21(1):5–30.
  5. Estimation of the information by an adaptive partitioning of the observation space. IEEE Transactions on Information Theory, 45(4):1315–1321.
  6. Statistical methods in graphs: parameter estimation, model selection, and hypothesis test. Mathematical Foundations and Applications of Graph Entropy, 6:183–202.
  7. Testing the manifold hypothesis. Journal of the American Mathematical Society, 29(4):983–1049.
  8. Independent coordinates for strange attractors from mutual information. Physical review A, 33(2):1134.
  9. Efficient estimation of mutual information for strongly dependent variables. In Artificial intelligence and statistics, pages 277–286. PMLR.
  10. Time-inhomogeneous diffusion geometry and topology. arXiv preprint arXiv:2203.14860.
  11. Estimating mixture entropy with pairwise distances. Entropy, 19(7):361.
  12. Estimating mutual information. Physical review E, 69(6):066138.
  13. Input feature selection by mutual information based on parzen window. IEEE transactions on pattern analysis and machine intelligence, 24(12):1667–1671.
  14. Assessing neural network representations during training using data diffusion spectra. In ICML 2023 Workshop on Topology, Algebra and Geometry in Machine Learning (TAG-ML).
  15. Complex information dynamics of epidemic spreading in low-dimensional networks.
  16. Visualizing structure and transitions in high-dimensional biological data. Nature biotechnology, 37(12):1482–1492.
  17. Estimation of mutual information using kernel density estimators. Physical Review E, 52(3):2318.
  18. Paninski, L. (2003). Estimation of entropy and mutual information. Neural computation, 15(6):1191–1253.
  19. On the information bottleneck theory of deep learning. In International Conference on Learning Representations.
  20. On the information bottleneck theory of deep learning. Journal of Statistical Mechanics: Theory and Experiment, 2019(12):124020.
  21. Identification of network topology variations based on spectral entropy. IEEE Transactions on Cybernetics, 52(10):10468–10478.
  22. Approximating mutual information by maximum likelihood density ratio estimation. In New challenges for feature selection in data mining and knowledge discovery, pages 5–20. PMLR.
  23. Discriminating different classes of biological networks by analyzing the graphs spectra distribution. PloS one, 7(12):e49949.
  24. Deep learning and the information bottleneck principle. In 2015 ieee information theory workshop (itw), pages 1–5. IEEE.
  25. Visualizing data using t-sne. Journal of machine learning research, 9(11).
  26. Recovering gene interactions from single-cell data using data diffusion. Cell, 174(3):716–729.
  27. Ver Steeg, G. (2000). Non-parametric entropy estimation toolbox (npeet). Non-parametric entropy estimation toolbox (NPEET).
  28. Graph information theoretic measures on functional connectivity networks based on graph-to-signal transform. In 2016 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pages 1137–1141.
  29. von Neumann, J. (2018). Mathematical foundations of quantum mechanics: New edition, volume 53. Princeton university press.
Citations (4)

Summary

We haven't generated a summary for this paper yet.