Papers
Topics
Authors
Recent
2000 character limit reached

HiResCAM: High-Res CNN Attribution

Updated 20 January 2026
  • HiResCAM is a gradient-based post hoc explanation method for CNNs that generates high-resolution class activation maps with precise spatial fidelity.
  • It leverages pointwise gradient–activation interactions at each spatial location, avoiding pooling to preserve fine-grained details compared to Grad-CAM.
  • Empirical results in medical imaging, malware analysis, and remote sensing demonstrate HiResCAM’s effectiveness in enhancing model transparency and trust.

HiResCAM is a gradient-based post hoc explanation method for convolutional neural networks (CNNs) that generates high-resolution, spatially faithful attribution maps, specifically targeting the explanation of class-specific model decisions. By leveraging pointwise gradient–activation interactions at each spatial location of the last convolutional layer, HiResCAM maintains fine-grained spatial resolution, providing class activation maps (CAMs) that precisely reflect the regions in the input that the model truly relied on for its predictions. This feature distinguishes HiResCAM from earlier CAM variants that rely on spatial pooling and often generate coarser or less faithful visual explanations.

1. Mathematical Definition and Core Mechanism

Let A={Af}f=1FA = \{A^f\}_{f=1}^F denote the activations from a CNN's selected convolutional layer, where AfA^f is the ff-th feature map at spatial location (i,j)(i,j). For a given class cc with pre-softmax score scs_c (logit), HiResCAM computes the gradient of scs_c with respect to AijfA^f_{ij} at each location, forming a full gradient tensor Gcf(i,j)=∂sc∂AijfG_c^f(i,j) = \frac{\partial s_c}{\partial A^f_{ij}}.

The HiResCAM map is defined as:

HiResCAMc(i,j)=ReLU(∑f=1FGcf(i,j)⋅Aijf)\text{HiResCAM}_c(i,j) = \mathrm{ReLU}\left(\sum_{f=1}^F G_c^f(i,j) \cdot A^f_{ij} \right)

where ReLU\mathrm{ReLU} optionally rectifies the map to focus on spatial locations contributing positively to the class score. The resulting map is min–max normalized and upsampled to the input resolution if required for visualization or mask extraction (Draelos et al., 2020, Draelos et al., 2021, Rafati et al., 25 Aug 2025).

This formulation preserves all local gradient information and multiplies it elementwise by activations before a final channel sum, ensuring the resulting saliency map directly reflects model-internal evidence for the class cc at each spatial location.

2. Theoretical Guarantee and Relationship to Other CAM Variants

HiResCAM is provably faithful for CNNs ending in a single fully connected (FC) layer. In such architectures, every active (nonzero) pixel in the HiResCAM map corresponds one-to-one with a feature-location pair that directly contributed to the class score via the linear weights. For models with the structure

sc=∑f,i,jwcf,i,jAijf+bcs_c = \sum_{f,i,j} w_c^{f,i,j} A^f_{ij} + b_c

the gradient ∂sc∂Aijf=wcf,i,j\frac{\partial s_c}{\partial A^f_{ij}} = w_c^{f,i,j}, so HiResCAM exactly reconstructs the model's used evidence:

HiResCAMc(i,j)=∑fwcf,i,jAijf\text{HiResCAM}_c(i,j) = \sum_f w_c^{f,i,j} A^f_{ij}

For architectures with intermediate pooling or multi-layer heads, the faithfulness guarantee becomes approximate, as the local gradient is not a simple weight tied exclusively to class cc.

HiResCAM generalizes standard CAM (which requires a global average pooling (GAP) layer and only produces channelwise scalar weights) by working without the pooling constraint and retaining all spatial detail. In contrast, Grad-CAM globally averages the gradient for each channel, producing a single scalar weight per feature map, and then forms

GradCAMc(i,j)=ReLU(∑f=1FGˉcf⋅Aijf)\text{GradCAM}_c(i,j) = \mathrm{ReLU}\left(\sum_{f=1}^F \bar{G}_c^f \cdot A^f_{ij}\right)

with Gˉcf=1D1D2∑i,j∂sc∂Aijf\bar{G}_c^f = \frac{1}{D_1 D_2}\sum_{i,j} \frac{\partial s_c}{\partial A^f_{ij}}, leading to lower spatial fidelity (Draelos et al., 2020, Rafati et al., 25 Aug 2025, Burger et al., 2023).

Empirically, HiResCAM delivers sharper, less "bloated" maps compared to Grad-CAM, avoiding systematic over-highlighting of regions the model did not use.

3. Integration in Vision and Medical Imaging Pipelines

HiResCAM is typically implemented as a post hoc step without any architectural changes. In standard image classification (e.g., ResNet, VGG, EfficientNet), HiResCAM is computed at the last convolutional block output, then upsampled for interpretation (Rafati et al., 25 Aug 2025, Burger et al., 2023). In detection tasks (e.g., Faster R-CNN), the approach generalizes by explaining per-output (e.g., per bounding box), using the detection logit and target layer (Zhou et al., 24 Oct 2025).

In volumetric medical imaging, HiResCAM is applied to 3D feature tensors over slices (e.g., in AxialNet for chest CT) (Draelos et al., 2021). Here, HiResCAM not only localizes sub-slice structures but supports faithful organ-wise attribution when paired with anatomical constraints.

The standard implementation steps are:

  1. Forward pass: obtain class (or detection) logits and selected activation tensor.
  2. Backward pass: compute the gradient ∂sc∂Aijf\frac{\partial s_c}{\partial A^f_{ij}}.
  3. Elementwise multiplication: Aijf⋅∂sc∂AijfA^f_{ij} \cdot \frac{\partial s_c}{\partial A^f_{ij}}.
  4. Channel summation and ReLU.
  5. Normalization and optional upsampling.

No additional hyperparameters or architectural modifications are introduced (Brosolo et al., 3 Mar 2025, Draelos et al., 2021, Brosolo et al., 4 Mar 2025).

4. Empirical Evaluation and Domain-Specific Performance

Medical Imaging

HiResCAM has demonstrated strong localization performance for classification tasks where only global labels are available. In brain hemorrhage diagnosis (EfficientNetV2S, Hemorica dataset), HiResCAM achieved bounding-box Dice 0.5723, IoU 0.4009, and a loose hit rate of 0.9753, outperforming Grad-CAM and approaching the pixel precision of computationally expensive AblationCAM (Rafati et al., 25 Aug 2025). In chest CT, HiResCAM, with an anatomical mask loss, improved the fraction of model attention within allowed organs by +33%, reducing spurious attribution outside clinically plausible regions (Draelos et al., 2021).

Malware Visualization

In malware-as-image settings, HiResCAM consistently produced sharper, more precisely bounded attributions corresponding to byte regions or PE headers essential for species classification, unlike the blurred, over-inclusive maps of Grad-CAM. Cumulative HiResCAM heatmaps supported the discovery of family-specific code signatures, informed data augmentation, and led to practical F1 boosts (from 2% up to 8%) in downstream Visual Transformer (ViT) models when used as saliency-inspired masks (Brosolo et al., 4 Mar 2025, Brosolo et al., 3 Mar 2025).

Remote Sensing and Ecology

HiResCAM yielded higher-fidelity localization for solar panel detection (ResNet-50 classifier), closely matching GradCAM++ in boundary accuracy (Polygon Dice 0.385 for HiResCAM) (Burger et al., 2023), and in ecological monitoring, localized animal detection focus to actual object boundaries (58.56% attribution in bounding boxes and 69.7% hit-rate), supporting both model trust and failure diagnosis (Zhou et al., 24 Oct 2025).

5. Theoretical Connections: Shapley Value and Game-Theoretic Perspective

Recent work interprets HiResCAM through the lens of cooperative game theory. In the Content Reserved Game-theoretic (CRG) framework, each spatial location in the feature tensor is treated as a player whose Shapley value quantifies its contribution to the model's prediction. HiResCAM emerges as a first-order linear Taylor approximation of these local Shapley values, weighting each activation by its gradient:

HiResCAMc(i,j)=∑f=1Nℓ∂U∂Aijf∣XD⋅Aijf\text{HiResCAM}_c(i,j) = \sum_{f=1}^{N_\ell} \left. \frac{\partial U}{\partial A^f_{ij}}\right|_{X_D} \cdot A^f_{ij}

for utility UU (typically the pre-softmax logit ycy^c or post-softmax probability pcp^c) (Cai, 9 Jan 2025).

This perspective shows that Grad-CAM can be seen as a "Type-II CRG explainer" (pooled gradients), while HiResCAM is "Type-I" (no pooling), and that second-order corrections (involving the Hessian) yield theoretically more precise ShapleyCAM explanations.

The ReST (Residual Softmax Target-class) utility further refines this, mitigating the pitfalls of pure logits (which can be unstable) and softmax probabilities (which can vanish for confident predictions). The result is better localization in challenging conditions.

6. Practical Limitations and Usage Considerations

  • Faithfulness guarantee: HiResCAM is only exactly faithful for architectures with a single FC layer. For deep heads or intermediate pooling layers, the map's faithfulness is approximate (Draelos et al., 2020).
  • Overly focal maps: HiResCAM may generate "spiky" activations, sometimes covering less of an object than Grad-CAM, which can be detrimental in segmentation contexts where smooth coverage is preferred (Draelos et al., 2020, Rafati et al., 25 Aug 2025).
  • Resolution limit: Regardless of the method, map resolution cannot exceed that of the activation tensor fed into HiResCAM. High stride or downsampling in early layers still imposes coarse spatial limits (Brosolo et al., 3 Mar 2025).
  • Computational cost: HiResCAM requires a backward pass per output class or detection, but is substantially more efficient than perturbation-based methods such as AblationCAM (Rafati et al., 25 Aug 2025).
  • Interpretation: HiResCAM identifies features that increase the model's score, not those whose removal would decrease confidence. Complementary tests (e.g., occlusion) can provide additional insight (Brosolo et al., 3 Mar 2025, Zhou et al., 24 Oct 2025).

7. Applications and Impact Across Domains

HiResCAM is employed for:

  • Clinical model auditing and regulatory compliance: Faithful attribution maps enable verification that models rely on pathologically plausible regions, reducing the risk of confounding-based predictions in healthcare (Draelos et al., 2021, Rafati et al., 25 Aug 2025).
  • Forensic malware analysis: Fine-grained heatmaps reveal synthetic code features, packing artifacts, and enable debugging of misclassifications, supporting robust model design (Brosolo et al., 3 Mar 2025, Brosolo et al., 4 Mar 2025).
  • Remote sensing and ecological monitoring: High-resolution object localization boosts credibility of automated detection and aids in model improvement strategies (Burger et al., 2023, Zhou et al., 24 Oct 2025).

Table: Summary of Quantitative Results from Selected Domains

Domain Dataset/Arch HiResCAM Key Score Grad-CAM Reference
Brain hemorrhage Hemorica/EffNetV2-S BBox Dice 0.5723 0.2154 (Rafati et al., 25 Aug 2025)
Chest CT RAD-ChestCT/AxialNet Organ-IoU +33% (rel. gain) Lower (Draelos et al., 2021)
Malware classification MalImg/ResNet50, VGG16 ViT F1 +8.9 pp (Big2015) +0.5 pp (Brosolo et al., 4 Mar 2025)
Seal detection GBay/Faster R-CNN 58.56% box attribution -- (Zhou et al., 24 Oct 2025)
Solar panel loc. RemoteS/ResNet-50 Poly Dice 0.385 0.377 (Burger et al., 2023)

HiResCAM has established itself as a preferred explainability method in both sensitive and diagnostic domains, where high spatial fidelity, faithfulness, and the avoidance of spurious model evidence are paramount.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to HiResCAM.