MelanomaNet: Explainable Skin Lesion System

Updated 17 December 2025

MelanomaNet is a deep learning system for multi-class skin lesion classification that combines high diagnostic accuracy with comprehensive interpretability features.
It leverages an EfficientNet V2 backbone and integrates GradCAM++, automated ABCDE extraction, FastCAV, and Monte Carlo Dropout to provide clear, multi-level explanations.
Evaluated on the ISIC 2019 dataset, the system demonstrates competitive performance and reliability metrics, underscoring its potential for clinical dermatology applications.

MelanomaNet is an explainable deep learning system designed for multi-class skin lesion classification, specifically addressing the interpretability limitations typical of high-performing convolutional neural networks (CNNs) in dermatological contexts. Integrating an EfficientNet V2 backbone with four complementary interpretability mechanisms—GradCAM++ spatial attention, automated ABCDE clinical criterion extraction, Fast Concept Activation Vectors (FastCAV), and Monte Carlo Dropout–based uncertainty quantification—MelanomaNet achieves high diagnostic accuracy while generating clinically meaningful, multi-level explanations. Its evaluation on the ISIC 2019 dataset demonstrates that strong predictive performance can be simultaneously attained with comprehensive interpretability, potentially facilitating adoption in clinical dermatology workflows (Ilyosbekov, 10 Dec 2025).

1. Architectural Framework and Training Protocol

MelanomaNet employs the EfficientNet V2-M architecture (approximately 54 million parameters) as its feature extraction backbone, modified to accept $384 \times 384$ pixel input images to preserve fine-grained dermoscopic features such as pigment networks and subtle border irregularities. The feature extraction pipeline consists of a stem convolution followed by a sequence of MBConv and ResNet-style blocks selected via training-aware neural architecture search, yielding a $1,280 \times 12 \times 12$ feature map.

The classification head utilizes global average pooling, a dropout layer (dropout rate $0.3$), and a fully connected layer producing logits for eight lesion categories (melanoma [MEL], nevus [NV], basal cell carcinoma [BCC], actinic keratosis [AK], benign keratosis [BKL], dermatofibroma [DF], vascular lesion [VASC], squamous cell carcinoma [SCC]). Class imbalance is managed via a weighted cross-entropy loss with weights inversely proportional to the prevalence of each class in the training set. Optimization is performed with AdamW ( $\text{lr}=10^{-4}$ , weight decay $=10^{-4}$ ), a cosine-annealing schedule, and a training regimen of 100 epochs. Data augmentation includes horizontal and vertical flips, rotations ( $\pm20^\circ$ ), affine transformations (translation $\pm10\%$ , scaling $0.9-1.1$), and color jitter (brightness/contrast/saturation $\pm0.2$ , hue $\pm0.1$ ). Input images are resized, center-cropped, and normalized using per-channel ImageNet statistics (Ilyosbekov, 10 Dec 2025).

2.1 GradCAM++ Attention Visualization

Spatial attention is computed using GradCAM++ to generate class-discriminative heatmaps for the final convolutional layer. For class $c$ , let $A^k \in \mathbb{R}^{U \times V}$ denote the $k$ -th feature map and $y^c$ the pre-softmax score. GradCAM++ computes pixel-wise weights $\alpha_{ij}^{kc}$ :

$\alpha_{ij}^{kc} = \frac{ \frac{\partial^2 y^c}{\partial (A_{ij}^k)^2} } { 2\,\frac{\partial^2 y^c}{\partial (A_{ij}^k)^2} + \sum_{p,q} A_{pq}^k\,\frac{\partial^3 y^c}{\partial (A_{pq}^k)^3} }$

The channel-wise weight is

$w_k^c = \sum_{i=1}^U \sum_{j=1}^V \alpha_{ij}^{kc} \;\mathrm{ReLU}\left(\frac{\partial y^c}{\partial A_{ij}^k}\right)$

The heatmap $H^c$ is then

$H^c = \mathrm{ReLU}\left(\sum_k w_k^c\,A^k\right)$

$H^c$ is upsampled to match input dimensions ( $384 \times 384$ ) and overlaid on the original image. Empirically, these heatmaps consistently localize to lesion centers and borders, indicating alignment between model attention and clinically salient regions (Ilyosbekov, 10 Dec 2025).

2.2 Automated ABCDE Clinical Criterion Extraction

ABCDE criteria—Asymmetry, Border irregularity, Color variation, Diameter, and (Evolution, not implemented)—are automatically extracted via a domain-adapted image processing pipeline:

Lesion Segmentation: Otsu’s thresholding on grayscale images, followed by morphological operations, yields a binary mask $M(i, j)$ .
Asymmetry (A): Reflections about centroid axes yield horizontal and vertical non-overlap fractions:

$A_{\text{horiz}} = 1 - \frac{|M \cap \mathrm{Flip}_{h}(M)|}{|M|}, \quad A_{\text{vert}} = 1 - \frac{|M \cap \mathrm{Flip}_{v}(M)|}{|M|}$

The reported $A$ is $\max(A_{\text{horiz}}, A_{\text{vert}})$ , normalized to $[0, 1]$ .

Border Irregularity (B): Contour compactness: $B = \mathrm{perimeter}^2/\mathrm{area}$ .
Color Variation (C): $k$ -means ( $k=6$ ) clustering is used within the lesion to count color clusters ( $>5\%$ area), yielding $n_\text{colors}$ . Variation is the standard deviation of cluster centroid distances.
Diameter (D): $D = 2\,\max(r, d/2)$ , with $r$ as radius of the minimum enclosing circle and $d$ as bounding box diagonal.
Composite risk stratification: Criteria— $A>0.3$ , $B>0.4$ , $n_\text{colors}>3$ , $D>114$ px—define low, medium, and high risk based on count.

This extraction pipeline enables the mapping of model predictions to established risk criteria (Ilyosbekov, 10 Dec 2025).

2.3 Fast Concept Activation Vectors (FastCAV)

Concept-level explanations use FastCAV to quantify how specific clinical concepts support or oppose a classification outcome:

Positive/negative sets for each concept (e.g., multicolor, irregular border, large diameter) are determined by ABCD thresholds.
A linear classifier (SGD) is fit in the $1{,}280$ -D feature space $h(x)$ to separate concept-positive from concept-negative feature vectors, yielding the concept activation vector $v_c$ .
For class $c$ and input $x$ , the directional derivative $D_{v_c} y^c(x) = \nabla_{h(x)} y^c \cdot v_c$ quantifies influence.
The TCAV score for concept $c$ is

$\mathrm{TCAV}_c = \frac{1}{N} \sum_{x \in X} \mathbf{1}\left[D_{v_c} y^c(x) > 0\right]$

indicating the fraction of examples for which increasing the concept weight enhances the class logit. Large, positive TCAV values signal strong support for the class; negative values indicate opposition.

2.4 Monte Carlo Dropout Uncertainty Quantification

Uncertainty is decomposed through MC Dropout by performing $T=10$ stochastic forward passes with dropout active ($0.3$ rate):

Predictive entropy:

$\bar{p} = \frac{1}{T} \sum_{t=1}^T p_t, \quad H[\bar{p}] = -\sum_c \bar{p}_c \log \bar{p}_c$

Epistemic uncertainty:

$\mathrm{Var}_{\mathrm{epi}}[p] = \frac{1}{T} \sum_{t=1}^T (p_t - \bar{p})^2$

Aleatoric uncertainty:

$\frac{1}{T} \sum_{t=1}^T H[p_t]$

Predictions with $H[\bar{p}] > 0.5$ are automatically flagged as "UNRELIABLE" for clinical review.

3. Dataset, Evaluation Metrics, and Quantitative Outcomes

MelanomaNet is assessed on the ISIC 2019 dataset, comprising 25,331 dermoscopic images across nine diagnostic classes, with a held-out test subset ( $N=3,800$ ).

Metric	Value
Overall accuracy	0.8561
Weighted precision	0.8600
Weighted recall	0.8561
Weighted F1	0.8564
Macro-average F1	0.8036

Selected per-class F1 scores: MEL — $0.7743$, NV — $0.9131$, BCC — $0.8934$, AK — $0.6917$, DF — $0.7209$. Performance is strongest on high-prevalence classes (NV, BCC), with expected degradation on rare classes (DF, AK). No direct baseline comparison is included; however, top-line accuracy is competitive with prior EfficientNet-based approaches to the ISIC challenge (84–87%) (Ilyosbekov, 10 Dec 2025).

4. Clinical Alignment, Explanatory Outputs, and Case Studies

MelanomaNet’s explanatory outputs are quantitatively and qualitatively aligned with dermatological assessment.

GradCAM++ lesion and border attention: On the test set, mean lesion attention is $0.60$ (proportion of heatmap mass inside lesion mask), border attention is $0.53$. These figures indicate spatial focus congruent with ABCDE regions.
Composite ABCDE risk scores enable risk stratification in a manner interpretable to clinicians.
Explanation case studies: For a benign nevus, high-confidence ( $94.49\%$ ) and low uncertainty ($0.088$, "RELIABLE") are reported, with ABCDE suggesting medium risk, and FastCAV indicating that large diameter and multicolor support the NV outcome, while asymmetry and irregular border oppose. The GradCAM++ map tightly overlaps the lesion center. For a melanoma case, high model confidence ( $100\%$ ) is tempered by elevated predictive entropy ($0.76$, "UNCERTAIN", primarily aleatoric), demonstrating MelanomaNet’s capacity to flag ambiguous predictions despite maximum softmax response (Ilyosbekov, 10 Dec 2025).

A plausible implication is that MelanomaNet supports informed human–AI collaboration by clarifying when predictions should be trusted or deferred for clinical adjudication.

5. Significance for Transparent AI in Dermatology

By incorporating interpretability directly within the model architecture—spanning spatial (GradCAM++), feature-level (ABCDE), concept-based (FastCAV), and reliability (MC Dropout) explanations—MelanomaNet operationalizes the principle that trustworthy AI for clinical use must deliver both accuracy and actionable evidence. Empirical alignment metrics and visually intuitive saliency maps reinforce the network’s “attention” consistency with established dermatological practice. Uncertainty quantification differentiates between lack of model knowledge and data-intrinsic ambiguity, addressing critical concerns in medical AI deployment.

In summary, MelanomaNet’s methodological synthesis and clinical grounding mark a significant advance in the pursuit of transparent, trustworthy deep learning systems for skin cancer diagnosis (Ilyosbekov, 10 Dec 2025).

PDF Markdown Chat (Pro)

References (1)

MelanomaNet: Explainable Deep Learning for Skin Lesion Classification (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to MelanomaNet.