Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers (2505.06745v1)

Published 10 May 2025 in cs.CV and cs.AI

Abstract: Recent neuro-symbolic approaches have successfully extracted symbolic rule-sets from CNN-based models to enhance interpretability. However, applying similar techniques to Vision Transformers (ViTs) remains challenging due to their lack of modular concept detectors and reliance on global self-attention mechanisms. We propose a framework for symbolic rule extraction from ViTs by introducing a sparse concept layer inspired by Sparse Autoencoders (SAEs). This linear layer operates on attention-weighted patch representations and learns a disentangled, binarized representation in which individual neurons activate for high-level visual concepts. To encourage interpretability, we apply a combination of L1 sparsity, entropy minimization, and supervised contrastive loss. These binarized concept activations are used as input to the FOLD-SE-M algorithm, which generates a rule-set in the form of logic programs. Our method achieves a 5.14% better classification accuracy than the standard ViT while enabling symbolic reasoning. Crucially, the extracted rule-set is not merely post-hoc but acts as a logic-based decision layer that operates directly on the sparse concept representations. The resulting programs are concise and semantically meaningful. This work is the first to extract executable logic programs from ViTs using sparse symbolic representations. It bridges the gap between transformer-based vision models and symbolic logic programming, providing a step forward in interpretable and verifiable neuro-symbolic AI.

Summary

Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers

The paper "Symbolic Rule Extraction from Attention-Guided Sparse Representations in Vision Transformers" presents a novel neuro-symbolic framework aimed at enhancing the interpretability and verifiability of Vision Transformers (ViTs) in the domain of image classification. As Vision Transformers have become increasingly prevalent due to their superior performance compared to Convolutional Neural Networks (CNNs), the necessity for interpretability in such models has grown, especially for critical applications where understanding the decision-making process is imperative.

The authors propose a framework that incorporates a sparse concept layer inspired by Sparse Autoencoders (SAEs) into the ViT architecture. This layer produces attention-weighted, binarized representations of visual concepts, which facilitate the extraction of interpretable logic programs using the FOLD-SE-M algorithm. The method achieves a 5.14% improvement in classification accuracy over the standard ViT, setting a precedent in neuro-symbolic AI by not only improving performance but also enabling symbolic reasoning through executable logic programs.

Key Contributions

Sparse Representations for Interpretability: The framework modifies the standard ViT by incorporating a sparse concept layer. It encourages each neuron to activate for high-level visual concepts using a combination of L1 sparsity loss, entropy minimization, and supervised contrastive loss. This sparse representation enables the subsequent extraction of symbolic rules, making the neural model's reasoning process more transparent and verifiable.
FOLD-SE-M Algorithm: The authors use the FOLD-SE-M algorithm to generate rule-sets that are concise and semantically meaningful. By employing binarized concept activations as input, the algorithm generates logic programs that act not merely as post-hoc explanations but as logic-based decision layers. This approach bridges the gap between transformer-based models and symbolic logic programming, offering a pragmatic path towards interpretable and verifiable AI.
Improved Classification Accuracy: The experimental results demonstrated an average improvement of 5.14% in classification accuracy over vanilla ViTs across various datasets. Notably, the neuro-symbolic model exceeds the vanilla ViT's accuracy on all but one dataset, emphasizing the effectiveness of incorporating symbolic reasoning directly into the model's architecture.

Implications and Future Directions

The integration of sparse autoencoders into Vision Transformers for symbolic rule extraction paves the way for enhanced interpretability in complex AI models. The framework's ability to improve the classification accuracy while providing concise rule-sets represents a significant step forward in making deep learning models both performant and explainable.

In practical terms, models derived from this framework could be applied to sensitive domains such as autonomous driving and medical diagnosis, where decision transparency is crucial. The proposed architecture empowers users to verify outcomes, mitigating risks associated with incorrect predictions.

From a theoretical perspective, this work lays foundational insights for further exploration into the disentanglement of features within self-attention layers of ViTs. The authors acknowledge the limitations in the semantic labelling of neurons and suggest future research could focus on refining neuron monosemanticity and leveraging multimodal LLMs for more robust concept labeling.

In summary, the paper significantly contributes to the field of explainable AI by addressing the interpretability challenges of Vision Transformers. The novel framework not only maintains high predictive performance but also enhances understanding through logically grounded rule extraction. Future advancements may involve exploring additional architectures to further improve neuron distinguishability and extending the symbolic reasoning capabilities to other domains within AI research.