Closed-Form Interpretation of Neural Network Classifiers with Symbolic Gradients (2401.04978v2)

Published 10 Jan 2024 in cs.LG and cs.AI

Abstract: I introduce a unified framework for finding a closed-form interpretation of any single neuron in an artificial neural network. Using this framework I demonstrate how to interpret neural network classifiers to reveal closed-form expressions of the concepts encoded in their decision boundaries. In contrast to neural network-based regression, for classification, it is in general impossible to express the neural network in the form of a symbolic equation even if the neural network itself bases its classification on a quantity that can be written as a closed-form equation. The interpretation framework is based on embedding trained neural networks into an equivalence class of functions that encode the same concept. I interpret these neural networks by finding an intersection between the equivalence class and human-readable equations defined by a symbolic search space. The approach is not limited to classifiers or full neural networks and can be applied to arbitrary neurons in hidden layers or latent spaces.

References (41)

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/S_J_Wetzel/status/1745300861180379619

https://twitter.com/S_J_Wetzel/status/1795543389653364770

Closed-Form Interpretation of Neural Network Classifiers with Symbolic Gradients (2401.04978v2)

Summary

Related Papers

Tweets