MACE Architecture for CNN Explanations
- MACE Architecture is a model-agnostic framework that decomposes CNN predictions into spatially localized, quantitatively relevant concept contributions, enabling interpretable decision-making.
- It integrates a four-stage modular system—map generator, embedding generator, relevance estimator, and output generator—to systematically analyze feature maps.
- Empirical validations on models like VGG16 and ResNet50 show that MACE delivers faithful, human-preferred explanations over traditional gradient-based methods.
MACE (Model Agnostic Concept Extractor) is a post-hoc, model-agnostic explanatory architecture for convolutional neural networks (CNNs) in image classification, designed to decompose a model’s prediction into contributions from multiple, localized “concepts.” The framework introduces a lateral, non-invasive module capable of dissecting feature map activations into interpretable, invariant concept embeddings and estimating each concept’s quantitative relevance to the model’s output. By systematically combining map generation, embedding extraction, relevance quantification, and output reconstruction, MACE yields fine-grained, faithful, and human-preferred explanations directly tied to the black-box model’s decision process (Kumar et al., 2020).
1. Architectural Overview
MACE is implemented as a four-stage, modular system attached between the final convolutional and first dense (fully connected) layers of a pre-trained CNN. The process begins by extracting the last convolutional feature map . The four modules are:
- Map Generator: Applies 1D convolutional filters across to produce concept maps highlighting spatial activations potentially corresponding to semantic parts.
- Embedding Generator: Processes each spatial concept map through a multilayer dense network to yield a low-dimensional, spatially and orientationally invariant embedding.
- Relevance Estimator: Assigns scores to each embedding, quantifying the extent of its positive or negative contribution to the predicted class.
- Output Generator: Aggregates relevance-weighted embeddings to reconstruct a class probability, ensuring explanations are tightly coupled to model outputs.
This sequence is illustrated as:
- Extract from the CNN;
- Generate for each concept of class ;
- Embed via a dense network;
- Compute relevance ;
- Predict class with .
2. Concept Extraction and Invariant Embedding
The map generator learns a bank of 1D convolutional filters that, when applied to , extract localized activations corresponding to potential semantic concepts (e.g., texture, parts such as “ear” or “fur”). The ReLU activation selects strictly positive, high-significance features.
To eliminate confounds from location and orientation, is encoded as through a dense network, trained using a triplet loss:
where , , are anchor, positive, and negative embeddings, respectively, and is the triplet margin. This objective ensures embeddings of the same concept cluster tightly while enforcing separation from other concepts.
3. Relevance Estimation and Output Approximation
Relevance scores for each concept are calculated as:
The sum , after a sigmoid transformation, approximates the class probability:
A cross-entropy loss aligns with the original CNN’s prediction, explicitly tying the interpretable concept contributions to the model’s quantitative output. This mechanism supports negative relevance, allowing MACE to identify features that detract from confidence in a particular class and thus explain not only why a certain class is predicted but also why alternatives are suppressed.
4. Empirical Validation
MACE is validated on VGG16 and ResNet50 architectures and on datasets such as Animals with Attributes 2 (AWA2) and Places365. Salient findings include:
- Faithfulness: Ablation studies masking concept regions in the input show that MACE-identified concepts, when masked, cause larger drops in class probabilities compared to saliency-based baselines.
- Explanation for Multiple Outputs: MACE generates relevance maps and explanations for both correct and incorrectly predicted classes, yielding insight into the fine-grained class decision boundary (e.g., differences between “fox,” “German Shepherd,” and “wolf” outputs).
- Human Preference: In user studies, 48% of votes selected MACE-generated explanations over those from methods such as GradCAM, VisCNN, or Excitation Backpropagation, reflecting improved human interpretability.
- Ranking Decay: Concept relevance scores consistently decay with class rank among the model’s output logits, suggesting respect for decision hierarchy.
5. Comparison with Other Explanation Methods
MACE contrasts with standard gradient-based and attention-based explanation techniques:
| Method | Concept Decomposition | Explicit Relevance Scoring | Spatial Fidelity | Model-Agnostic |
|---|---|---|---|---|
| MACE | Yes | Yes | Yes | Yes |
| GradCAM/GradCAM++ | No | No | Yes | No |
| Excitation Backpropagation | No | Partial | Yes | No |
| Sliding Window/ProtoPNet | Partial | No | Yes | No |
MACE’s distinctive features are: extraction of multiple spatially resolved concepts, computation of a per-concept quantitative relevance, and easy integration via a single lateral connection with minimal architectural disruption. The use of triplet loss-trained embeddings ensures clustering of semantically similar concepts, robust to spatial or orientation changes across different samples—an advantage over methods that rely exclusively on gradient or heuristic attention analyses.
6. Implementation Considerations and Limitations
- Integration: MACE only requires access to internal feature maps, not gradients or model internals, and can thus be added post hoc to black-box models.
- Resource Cost: Training the concept modules (map generator, embedding generator, relevance estimator) involves additional forward/backward passes over feature maps and adds extra parameters for each concept/class pair.
- Applicability: While validated on general image classifiers, extension to architectures lacking explicit convolutional layers or to modalities without interpretable local concepts may require adaptation.
- Limitation: The concept mapping is learned and does not correspond to pre-defined semantic concepts; interpretability depends on alignment between learned patterns and human-interpretable features.
7. Impact and Significance
MACE establishes a modular, faithful, and interpretable framework for explaining convolutional image classifiers. By decomposing predictions into localized concept contributions, it provides both qualitative and quantitative explanations for model outputs, aligns with human intuition, and maintains fidelity to the underlying black-box predictor—a capability not achieved by previous single-saliency or attention-based methods. This makes MACE an appropriate tool for auditing, trust-building, and understanding complex CNN-driven systems in domains requiring accountable AI, such as medical imaging, security, or scientific image analysis.