HiResCAM: High-Res CNN Attribution
- HiResCAM is a gradient-based post hoc explanation method for CNNs that generates high-resolution class activation maps with precise spatial fidelity.
- It leverages pointwise gradient–activation interactions at each spatial location, avoiding pooling to preserve fine-grained details compared to Grad-CAM.
- Empirical results in medical imaging, malware analysis, and remote sensing demonstrate HiResCAM’s effectiveness in enhancing model transparency and trust.
HiResCAM is a gradient-based post hoc explanation method for convolutional neural networks (CNNs) that generates high-resolution, spatially faithful attribution maps, specifically targeting the explanation of class-specific model decisions. By leveraging pointwise gradient–activation interactions at each spatial location of the last convolutional layer, HiResCAM maintains fine-grained spatial resolution, providing class activation maps (CAMs) that precisely reflect the regions in the input that the model truly relied on for its predictions. This feature distinguishes HiResCAM from earlier CAM variants that rely on spatial pooling and often generate coarser or less faithful visual explanations.
1. Mathematical Definition and Core Mechanism
Let denote the activations from a CNN's selected convolutional layer, where is the -th feature map at spatial location . For a given class with pre-softmax score (logit), HiResCAM computes the gradient of with respect to at each location, forming a full gradient tensor .
The HiResCAM map is defined as:
where optionally rectifies the map to focus on spatial locations contributing positively to the class score. The resulting map is min–max normalized and upsampled to the input resolution if required for visualization or mask extraction (Draelos et al., 2020, Draelos et al., 2021, Rafati et al., 25 Aug 2025).
This formulation preserves all local gradient information and multiplies it elementwise by activations before a final channel sum, ensuring the resulting saliency map directly reflects model-internal evidence for the class at each spatial location.
2. Theoretical Guarantee and Relationship to Other CAM Variants
HiResCAM is provably faithful for CNNs ending in a single fully connected (FC) layer. In such architectures, every active (nonzero) pixel in the HiResCAM map corresponds one-to-one with a feature-location pair that directly contributed to the class score via the linear weights. For models with the structure
the gradient , so HiResCAM exactly reconstructs the model's used evidence:
For architectures with intermediate pooling or multi-layer heads, the faithfulness guarantee becomes approximate, as the local gradient is not a simple weight tied exclusively to class .
HiResCAM generalizes standard CAM (which requires a global average pooling (GAP) layer and only produces channelwise scalar weights) by working without the pooling constraint and retaining all spatial detail. In contrast, Grad-CAM globally averages the gradient for each channel, producing a single scalar weight per feature map, and then forms
with , leading to lower spatial fidelity (Draelos et al., 2020, Rafati et al., 25 Aug 2025, Burger et al., 2023).
Empirically, HiResCAM delivers sharper, less "bloated" maps compared to Grad-CAM, avoiding systematic over-highlighting of regions the model did not use.
3. Integration in Vision and Medical Imaging Pipelines
HiResCAM is typically implemented as a post hoc step without any architectural changes. In standard image classification (e.g., ResNet, VGG, EfficientNet), HiResCAM is computed at the last convolutional block output, then upsampled for interpretation (Rafati et al., 25 Aug 2025, Burger et al., 2023). In detection tasks (e.g., Faster R-CNN), the approach generalizes by explaining per-output (e.g., per bounding box), using the detection logit and target layer (Zhou et al., 24 Oct 2025).
In volumetric medical imaging, HiResCAM is applied to 3D feature tensors over slices (e.g., in AxialNet for chest CT) (Draelos et al., 2021). Here, HiResCAM not only localizes sub-slice structures but supports faithful organ-wise attribution when paired with anatomical constraints.
The standard implementation steps are:
- Forward pass: obtain class (or detection) logits and selected activation tensor.
- Backward pass: compute the gradient .
- Elementwise multiplication: .
- Channel summation and ReLU.
- Normalization and optional upsampling.
No additional hyperparameters or architectural modifications are introduced (Brosolo et al., 3 Mar 2025, Draelos et al., 2021, Brosolo et al., 4 Mar 2025).
4. Empirical Evaluation and Domain-Specific Performance
Medical Imaging
HiResCAM has demonstrated strong localization performance for classification tasks where only global labels are available. In brain hemorrhage diagnosis (EfficientNetV2S, Hemorica dataset), HiResCAM achieved bounding-box Dice 0.5723, IoU 0.4009, and a loose hit rate of 0.9753, outperforming Grad-CAM and approaching the pixel precision of computationally expensive AblationCAM (Rafati et al., 25 Aug 2025). In chest CT, HiResCAM, with an anatomical mask loss, improved the fraction of model attention within allowed organs by +33%, reducing spurious attribution outside clinically plausible regions (Draelos et al., 2021).
Malware Visualization
In malware-as-image settings, HiResCAM consistently produced sharper, more precisely bounded attributions corresponding to byte regions or PE headers essential for species classification, unlike the blurred, over-inclusive maps of Grad-CAM. Cumulative HiResCAM heatmaps supported the discovery of family-specific code signatures, informed data augmentation, and led to practical F1 boosts (from 2% up to 8%) in downstream Visual Transformer (ViT) models when used as saliency-inspired masks (Brosolo et al., 4 Mar 2025, Brosolo et al., 3 Mar 2025).
Remote Sensing and Ecology
HiResCAM yielded higher-fidelity localization for solar panel detection (ResNet-50 classifier), closely matching GradCAM++ in boundary accuracy (Polygon Dice 0.385 for HiResCAM) (Burger et al., 2023), and in ecological monitoring, localized animal detection focus to actual object boundaries (58.56% attribution in bounding boxes and 69.7% hit-rate), supporting both model trust and failure diagnosis (Zhou et al., 24 Oct 2025).
5. Theoretical Connections: Shapley Value and Game-Theoretic Perspective
Recent work interprets HiResCAM through the lens of cooperative game theory. In the Content Reserved Game-theoretic (CRG) framework, each spatial location in the feature tensor is treated as a player whose Shapley value quantifies its contribution to the model's prediction. HiResCAM emerges as a first-order linear Taylor approximation of these local Shapley values, weighting each activation by its gradient:
for utility (typically the pre-softmax logit or post-softmax probability ) (Cai, 9 Jan 2025).
This perspective shows that Grad-CAM can be seen as a "Type-II CRG explainer" (pooled gradients), while HiResCAM is "Type-I" (no pooling), and that second-order corrections (involving the Hessian) yield theoretically more precise ShapleyCAM explanations.
The ReST (Residual Softmax Target-class) utility further refines this, mitigating the pitfalls of pure logits (which can be unstable) and softmax probabilities (which can vanish for confident predictions). The result is better localization in challenging conditions.
6. Practical Limitations and Usage Considerations
- Faithfulness guarantee: HiResCAM is only exactly faithful for architectures with a single FC layer. For deep heads or intermediate pooling layers, the map's faithfulness is approximate (Draelos et al., 2020).
- Overly focal maps: HiResCAM may generate "spiky" activations, sometimes covering less of an object than Grad-CAM, which can be detrimental in segmentation contexts where smooth coverage is preferred (Draelos et al., 2020, Rafati et al., 25 Aug 2025).
- Resolution limit: Regardless of the method, map resolution cannot exceed that of the activation tensor fed into HiResCAM. High stride or downsampling in early layers still imposes coarse spatial limits (Brosolo et al., 3 Mar 2025).
- Computational cost: HiResCAM requires a backward pass per output class or detection, but is substantially more efficient than perturbation-based methods such as AblationCAM (Rafati et al., 25 Aug 2025).
- Interpretation: HiResCAM identifies features that increase the model's score, not those whose removal would decrease confidence. Complementary tests (e.g., occlusion) can provide additional insight (Brosolo et al., 3 Mar 2025, Zhou et al., 24 Oct 2025).
7. Applications and Impact Across Domains
HiResCAM is employed for:
- Clinical model auditing and regulatory compliance: Faithful attribution maps enable verification that models rely on pathologically plausible regions, reducing the risk of confounding-based predictions in healthcare (Draelos et al., 2021, Rafati et al., 25 Aug 2025).
- Forensic malware analysis: Fine-grained heatmaps reveal synthetic code features, packing artifacts, and enable debugging of misclassifications, supporting robust model design (Brosolo et al., 3 Mar 2025, Brosolo et al., 4 Mar 2025).
- Remote sensing and ecological monitoring: High-resolution object localization boosts credibility of automated detection and aids in model improvement strategies (Burger et al., 2023, Zhou et al., 24 Oct 2025).
Table: Summary of Quantitative Results from Selected Domains
| Domain | Dataset/Arch | HiResCAM Key Score | Grad-CAM | Reference |
|---|---|---|---|---|
| Brain hemorrhage | Hemorica/EffNetV2-S | BBox Dice 0.5723 | 0.2154 | (Rafati et al., 25 Aug 2025) |
| Chest CT | RAD-ChestCT/AxialNet | Organ-IoU +33% (rel. gain) | Lower | (Draelos et al., 2021) |
| Malware classification | MalImg/ResNet50, VGG16 | ViT F1 +8.9 pp (Big2015) | +0.5 pp | (Brosolo et al., 4 Mar 2025) |
| Seal detection | GBay/Faster R-CNN | 58.56% box attribution | -- | (Zhou et al., 24 Oct 2025) |
| Solar panel loc. | RemoteS/ResNet-50 | Poly Dice 0.385 | 0.377 | (Burger et al., 2023) |
HiResCAM has established itself as a preferred explainability method in both sensitive and diagnostic domains, where high spatial fidelity, faithfulness, and the avoidance of spurious model evidence are paramount.