Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features (2405.10262v1)

Published 16 May 2024 in cs.LG, cs.AI, and cs.CV

Abstract: This paper investigates the dynamics of a deep neural network (DNN) learning interactions. Previous studies have discovered and mathematically proven that given each input sample, a well-trained DNN usually only encodes a small number of interactions (non-linear relationships) between input variables in the sample. A series of theorems have been derived to prove that we can consider the DNN's inference equivalent to using these interactions as primitive patterns for inference. In this paper, we discover the DNN learns interactions in two phases. The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders. We can consider the two-phase phenomenon as the starting point of a DNN learning over-fitted features. Such a phenomenon has been widely shared by DNNs with various architectures trained for different tasks. Therefore, the discovery of the two-phase dynamics provides a detailed mechanism for how a DNN gradually learns different inference patterns (interactions). In particular, we have also verified the claim that high-order interactions have weaker generalization power than low-order interactions. Thus, the discovered two-phase dynamics also explains how the generalization power of a DNN changes during the training process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
  2. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  3. Defining and extracting generalizable interaction primitives from dnns. In International Conference on Learning Representations (ICLR), 2024.
  4. Real time image saliency for black box classifiers. Advances in neural information processing systems, 30, 2017.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. arXiv preprint arXiv:1703.11008, 2017.
  7. Sharpness-aware minimization for efficiently improving generalization. arXiv preprint arXiv:2010.01412, 2020.
  8. The false hope of current approaches to explainable artificial intelligence in health care. The Lancet Digital Health, 3(11):e745–e750, 2021.
  9. On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836, 2016.
  10. Learning multiple layers of features from tiny images. 2009.
  11. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25, 2012.
  12. Asam: Adaptive sharpness-aware minimization for scale-invariant learning of deep neural networks. In International Conference on Machine Learning, pages 5905–5914. PMLR, 2021.
  13. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  14. Defining and quantifying and-or interactions for faithful and concise explanation of dnns. arXiv preprint arXiv:2304.13312, 2023a.
  15. Does a neural network really encode symbolic concept? In International Conference on Machine Learning, 2023b.
  16. Towards the difficulty for a deep neural network to learn concepts of different complexities. Advances in Neural Information Processing Systems, 36, 2024.
  17. Mohammed Ali mnmoustafa. Tiny imagenet, 2017. URL https://kaggle.com/competitions/tiny-imagenet.
  18. Norm-based capacity control in neural networks. In Conference on learning theory, pages 1376–1401. PMLR, 2015.
  19. A unified game-theoretic interpretation of adversarial robustness. arXiv preprint arXiv:2103.07364, 2021.
  20. Defining and quantifying the emergence of sparse concepts in dnns. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20280–20289, 2023a.
  21. Bayesian neural networks tend to ignore complex and sensitive concepts. arXiv e-prints, pages arXiv–2302, 2023b.
  22. Where we have arrived in proving the emergence of sparse interaction primitives in ai models. In The Twelfth International Conference on Learning Representations, 2023c.
  23. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5):206–215, 2019.
  24. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  25. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing, pages 1631–1642, 2013.
  26. The caltech-ucsd birds-200-2011 dataset. 2011.
  27. A unified approach to interpreting and boosting adversarial transferability. arXiv preprint arXiv:2010.04055, 2020.
  28. Interpreting attributions and interactions of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1095–1104, 2021.
  29. Dynamic graph cnn for learning on point clouds. ACM Transactions on Graphics (TOG), 2019.
  30. Interpreting multivariate shapley interactions in dnns. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 10877–10886, 2021.
  31. Explaining how a neural network play the go game and let people learn. arXiv preprint arXiv:2310.09838, 2023.
  32. Explaining generalization power of a dnn using interactive concepts. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17105–17113, 2024.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets