Towards White Box Deep Learning (2403.09863v5)

Published 14 Mar 2024 in cs.LG, cs.AI, and cs.NE

Abstract: Deep neural networks learn fragile "shortcut" features, rendering them difficult to interpret (black box) and vulnerable to adversarial attacks. This paper proposes semantic features as a general architectural solution to this problem. The main idea is to make features locality-sensitive in the adequate semantic topology of the domain, thus introducing a strong regularization. The proof of concept network is lightweight, inherently interpretable and achieves almost human-level adversarial test metrics - with no adversarial training! These results and the general nature of the approach warrant further research on semantic features. The code is available at https://github.com/314-Foundation/white-box-nn

References (11)

Collections

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Authors (1)

Maciej Satkiewicz

GitHub

GitHub - 314-Foundation/white-box-nn: Building a White Box Neural Network with Semantic Features (5 stars)

Tweets

https://twitter.com/MSatkiewicz/status/1770415218264264704

https://twitter.com/CompsciDiscu/status/1770254525019906221