Generalization ability and Vulnerabilities to adversarial perturbations: Two sides of the same coin (2205.10952v4)
Abstract: Deep neural networks (DNNs), the agents of deep learning (DL), require a massive number of parallel/sequential operations, which makes it difficult to comprehend them and impedes proper diagnosis. Without better knowledge of DNNs' internal process, deploying DNNs in high-stakes domains may lead to catastrophic failures. Therefore, to build more reliable DNNs/DL, it is imperative that we gain insights into their underlying decision-making process. Here, we use the self-organizing map (SOM) to analyze DL models' internal codes associated with DNNs' decision-making. Our analyses suggest that shallow layers close to the input layer map onto homogeneous codes and that deep layers close to the output layer transform these homogeneous codes in shallow layers to diverse codes. We also found evidence indicating that homogeneous codes may underlie DNNs' vulnerabilities to adversarial perturbations.
- Deep learning. Nature, 521(7553):436–444, 2015. ISSN 14764687. doi:10.1038/nature14539.
- Review of deep learning algorithms and architectures. IEEE Access, 7:53040–53065, 2019. doi:10.1109/ACCESS.2019.2912200.
- A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Systems, 194:105596, 2020. ISSN 0950-7051. doi:https://doi.org/10.1016/j.knosys.2020.105596. URL https://www.sciencedirect.com/science/article/pii/S095070512030071X.
- Zachary C. Lipton. The Mythos of Model Interpretability. In ICML WHI, 2016. ISBN 1539-3755 (Print). doi:10.1145/3233231. URL http://arxiv.org/abs/1606.03490.
- Cynthia Rudin. Please Stop Explaining Black Box Models for High Stakes Decisions. In NIPS Workshop, 2018. ISBN 1811.10154v2. doi:arXiv:1811.10154v2. URL http://arxiv.org/abs/1811.10154.
- Intriguing properties of neural networks, 2013. URL https://arxiv.org/abs/1312.6199.
- Adversarial attacks and defences: A survey, 2018. URL https://arxiv.org/abs/1810.00069.
- Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems, 30(9):2805–2824, 2019. doi:10.1109/TNNLS.2018.2886017.
- Feature visualization. Distill, 2017a. doi:10.23915/distill.00007. https://distill.pub/2017/feature-visualization.
- Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013. URL http://arxiv.org/abs/1311.2901.
- Activation atlas. Distill, 2019. doi:10.23915/distill.00015. https://distill.pub/2019/activation-atlas.
- Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381:607–9, 07 1996. doi:10.1038/381607a0.
- Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, oct 2019. doi:10.1007/s11263-019-01228-7. URL https://doi.org/10.1007%2Fs11263-019-01228-7.
- Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 06–11 Aug 2017. URL https://proceedings.mlr.press/v70/sundararajan17a.html.
- Towards better understanding of gradient-based attribution methods for deep neural networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Sy21R9JAW.
- Layer-Wise Relevance Propagation: An Overview, pages 193–209. Springer International Publishing, Cham, 2019. ISBN 978-3-030-28954-6. doi:10.1007/978-3-030-28954-6_10. URL https://doi.org/10.1007/978-3-030-28954-6_10.
- Feature visualization. Distill, 2017b. doi:10.23915/distill.00007. https://distill.pub/2017/feature-visualization.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2668–2677. PMLR, 10–15 Jul 2018. URL https://proceedings.mlr.press/v80/kim18d.html.
- Understanding intermediate layers using linear classifier probes, 2018.
- Jung Hoon Lee. Library network, a possible path to explainable neural networks, 2019. URL https://arxiv.org/abs/1909.13360.
- T. Kohonen. The self-organizing map. Proceedings of the IEEE, 78(9):1464–1480, 1990. doi:10.1109/5.58325.
- T. Kohonen and T. Honkela. Kohonen network. Scholarpedia, 2(1):1568, 2007. doi:10.4249/scholarpedia.1568. revision #127841.
- Topoact: Visually exploring the shape of activations in deep learning, 2021.
- Experimental observations of the topology of convolutional neural network activations, 2022.
- Deep residual learning for image recognition, 2015. URL https://arxiv.org/abs/1512.03385.
- Densely connected convolutional networks, 2018.
- Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- quicksom: Self-Organizing Maps on GPUs for clustering of molecular dynamics trajectories. Bioinformatics, 37(14):2064–2065, 11 2020. ISSN 1367-4803. doi:10.1093/bioinformatics/btaa925. URL https://doi.org/10.1093/bioinformatics/btaa925.
- SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020. doi:10.1038/s41592-019-0686-2.
- Learning multiple layers of features from tiny images. 2009.
- Reading digits in natural images with unsupervised feature learning. 2011.
- Towards deep learning models resistant to adversarial attacks, 2017. URL https://arxiv.org/abs/1706.06083.
- AdverTorch v0.1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019.
- Explainable adversarial attacks in deep neural networks using activation profiles, 2021. URL https://arxiv.org/abs/2103.10229.
- Adversarial examples are not bugs, they are features. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_files/paper/2019/file/e2c420d928d4bf8ce0ff2ec19b371514-Paper.pdf.
- Revisiting model stitching to compare neural representations. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021. URL https://openreview.net/forum?id=ak06J5jNR4.
- BERT: pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, and Thamar Solorio, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), pages 4171–4186. Association for Computational Linguistics, 2019. doi:10.18653/v1/n19-1423. URL https://doi.org/10.18653/v1/n19-1423.