Having Second Thoughts? Let's hear it (2311.15356v2)
Abstract: Deep learning models loosely mimic bottom-up signal pathways from low-order sensory areas to high-order cognitive areas. After training, DL models can outperform humans on some domain-specific tasks, but their decision-making process has been known to be easily disrupted. Since the human brain consists of multiple functional areas highly connected to one another and relies on intricate interplays between bottom-up and top-down (from high-order to low-order areas) processing, we hypothesize that incorporating top-down signal processing may make DL models more robust. To address this hypothesis, we propose a certification process mimicking selective attention and test if it could make DL models more robust. Our empirical evaluations suggest that this newly proposed certification can improve DL models' accuracy and help us build safety measures to alleviate their vulnerabilities with both artificial and natural adversarial examples.
- Review of deep learning algorithms and architectures. IEEE Access, 7:53040–53065, 2019.
- A review of deep learning with special emphasis on architectures, applications and recent trends. Knowledge-Based Systems, 194:105596, 2020.
- Zachary C. Lipton. The Mythos of Model Interpretability. In ICML WHI, 2016.
- Cynthia Rudin. Please Stop Explaining Black Box Models for High Stakes Decisions. In NIPS Workshop, 2018.
- Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1), 2021.
- Christoph Molnar. Interpretable Machine Learning. 2 edition, 2022.
- Layer-Wise Relevance Propagation: An Overview, pages 193–209. Springer International Publishing, Cham, 2019.
- Grad-CAM: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2):336–359, oct 2019.
- Axiomatic attribution for deep networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 3319–3328. PMLR, 06–11 Aug 2017.
- Visualizing and understanding convolutional networks. CoRR, abs/1311.2901, 2013.
- Understanding intermediate layers using linear classifier probes, 2018.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 2668–2677. PMLR, 10–15 Jul 2018.
- A systematic review of robustness in deep learning for computer vision: Mind the gap?, 2022.
- Robustness in deep learning: The good (width), the bad (depth), and the ugly (initialization). In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Improving the robustness of deep neural networks via stability training, 2016.
- Recent advances in adversarial training for adversarial robustness, 2021.
- Richard P. Heitz. The speed-accuracy tradeoff: history, physiology, methodology, and behavior. Frontiers in Neuroscience, 8, 2014.
- Auditory attention–focusing the searchlight on sound. Curr Opin Neurobiol, 17(4):437–455, 2007.
- Segment anything. arXiv:2304.02643, 2023.
- Emerging properties in self-supervised vision transformers. In Proceedings of the International Conference on Computer Vision (ICCV), 2021.
- Luca Medeiros. lang-segment-anything, 2023.
- Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
- George A. Miller. WordNet: A lexical database for English. In Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994, 1994.
- Robustness (python library), 2019.
- Edward Loper Bird, Steven and Ewan Klein. Natural language processing with python, 2007.
- Towards deep learning models resistant to adversarial attacks, 2017.
- AdverTorch v0.1: An adversarial robustness toolbox based on pytorch. arXiv preprint arXiv:1902.07623, 2019.
- Natural adversarial examples. CVPR, 2021.
- Deep residual learning for image recognition, 2015.
- Very deep convolutional networks for large-scale image recognition, 2015.
- Densely connected convolutional networks, 2018.
- Aggregated residual transformations for deep neural networks, 2017.
- Attention is all you need, 2023.
- Ross Wightman. Pytorch image models. https://github.com/rwightman/pytorch-image-models, 2019.
- spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear, 2017.
- Adversarial attacks and defences: A survey, 2018.
- Adversarial examples: Attacks and defenses for deep learning. IEEE Transactions on Neural Networks and Learning Systems, 30(9):2805–2824, 2019.
- Noisecam: Explainable ai for the boundary between noise and adversarial attacks, 2023.
- Xiao-Jing Wang. Neurophysiological and computational principles of cortical rhythms in cognition. Physiol Rev., 90:1195–268, 2010.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.