Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 90 tok/s
Gemini 2.5 Pro 53 tok/s Pro
GPT-5 Medium 41 tok/s
GPT-5 High 42 tok/s Pro
GPT-4o 109 tok/s
GPT OSS 120B 477 tok/s Pro
Kimi K2 222 tok/s Pro
2000 character limit reached

Towards White Box Deep Learning (2403.09863v5)

Published 14 Mar 2024 in cs.LG, cs.AI, and cs.NE

Abstract: Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn

Definition Search Book Streamline Icon: https://streamlinehq.com
References (11)
  1. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models, 2018. arXiv:1712.04248.
  2. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks, 2020. arXiv:2003.01690.
  3. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2(11):665–673, November 2020. URL: http://dx.doi.org/10.1038/s42256-020-00257-z, doi:10.1038/s42256-020-00257-z.
  4. Adversarial examples are not bugs, they are features, 2019. arXiv:1905.02175.
  5. Spatial transformer networks, 2016. arXiv:1506.02025.
  6. Why robust generalization in deep learning is difficult: Perspective of expressive power, 2022. arXiv:2205.13863.
  7. Kornia: an open source differentiable computer vision library for pytorch. In Winter Conference on Applications of Computer Vision, 2020. URL: https://arxiv.org/pdf/1910.02190.pdf.
  8. Dynamic routing between capsules, 2017. arXiv:1710.09829.
  9. Towards the first adversarially robust neural network model on mnist, 2018. arXiv:1805.09190.
  10. Intriguing properties of neural networks, 2014. arXiv:1312.6199.
  11. Analysis and applications of class-wise robustness in adversarial training, 2021. arXiv:2105.14240.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

Github Logo Streamline Icon: https://streamlinehq.com