Burn Scar Segmentation: Methods and Challenges

Updated 20 November 2025

Burn scar segmentation is the automated delineation of burn-affected regions in biomedical and remote-sensing imagery, crucial for clinical diagnosis and environmental monitoring.
Advanced deep learning architectures, including U-Net variants and two-stage detect-and-segment frameworks, optimize boundary detection and mitigate challenges like label noise.
Multitask models with auxiliary land-cover supervision and robust loss functions effectively address class imbalance and domain shift, improving segmentation precision.

Burn scar segmentation refers to the algorithmic delineation of regions affected by burns or fire-induced damage, either in biomedical imagery or remote-sensing data. The task plays a pivotal role in automated clinical assessment of skin injuries and in large-scale monitoring of wildland fire impact on terrestrial ecosystems. Methodologies encompass traditional image processing, spectral feature analysis, and contemporary deep learning-based semantic segmentation paradigms applied to both multi-/hyperspectral satellite imagery and clinical color photographs. The domain is characterized by challenges such as label noise, class imbalance, domain shift, and the need for fine-grained boundary localization.

1. Datasets and Annotation Practices

Burn scar segmentation systems are fundamentally limited by the breadth, quality, and annotation accuracy of their datasets. In remote sensing, modalities typically comprise visible-infrared (e.g., Landsat 8 RGB-NIR), and multispectral Sentinel-2 L2A feeds. For example, Mohla et al. introduced a dataset of 299 RGB-NIR Landsat 8 tiles (1024×1024 at 30m/pixel) attributed to burn scar and non-scar binary masks, with labels provided by INPE’s Projeto Queimadas. Labels exhibited both “missing positives” (unmarked burn scars) and “false positives” (non-burn erroneously annotated)—a hallmark of noisy, weak supervision (Mohla et al., 2020).

In the domain of environmental monitoring, Largilliere et al. compiled 433 Sentinel-2 tiles (2017–2023), augmented with Copernicus EMS-graded burn masks, 11-class ESA WorldCover land cover maps, and cloud validity masks. Class imbalance is acute: burn pixels constitute only 1–5% of each tile, while land cover is more evenly distributed (Arnaudo et al., 2023).

For biomedical applications, extensive curation and expert annotation are common. The BAM dataset consists of 1,684 color images labeled for four burn severities (SPF, SPT, DPT, FT) by burn surgeons, and an LDI-aligned set of 184 images for ground-truthing segmentation approaches (Abdolahnejad et al., 2023). Background removal (e.g., CNN-based skin segmentation) and geometric alignment with auxiliary scans are key preprocessing steps.

2. Segmentation Architectures

The foundation of burn scar segmentation is the convolutional encoder–decoder, exemplified by U-Net and its variants. AmazonNET implemented a classical 4-channel U-Net to ingest stacked RGB+NIR and process fragmented Amazonian wildfire scars. The encoder combines four down-sampling blocks; each doubles the channel count (64–1024), followed by skip-connected up-sampling in the decoder and a 1×1 convolution+sigmoid output for per-pixel probability estimation (Mohla et al., 2020).

Multitask models further extend this paradigm. Largilliere et al. deployed both UPerNet (ResNet-50/ViT-Small backbone) and SegFormer (MiT-B3 encoder), attaching two 1×1 convolutional output heads: one for burned area (binary), the other for land-cover (11-class). This structure enables shared feature extraction beneficial to both tasks (Arnaudo et al., 2023).

Biomedical approaches frequently employ two-stage detection–segmentation frameworks. “Detect-and-Segment” (DS) splits the process: Stage 1 involves a MobileNetV2+FPN detector, localizing the lesion (scar or wound); Stage 2 deploys a U-Net on a cropped, normalized patch centered on the detection (Scebba et al., 2021). This decoupling of localization from segmentation reduces input variability, addresses class imbalance, and enhances out-of-distribution generalization.

In saliency-guided approaches, BAM utilizes a deep CNN classifier (EfficientNet-B7), followed by boundary localization via a specialized attention mapping derived from Grad-CAM outputs and first-layer activations to yield fine-grained, class-sensitive masks (Abdolahnejad et al., 2023).

3. Loss Functions and Optimization

Binary cross-entropy (BCE) remains the standard loss for per-pixel (binary) labeling in both environmental (Mohla et al., 2020, Arnaudo et al., 2023) and biomedical (Scebba et al., 2021) segmentation contexts:

$L_{BCE} = -\frac{1}{N} \sum_{i=1}^N \left[y_i \log(p_i) + (1 - y_i) \log(1 - p_i)\right]$

Class-imbalance, severe in remote-sensing, motivates weighted BCE or combination with Dice loss. In DS, the segmentation loss is a convex combination of weighted BCE and soft Dice:

$\mathcal{L}_\mathrm{seg} = \alpha \,\mathcal{L}_\mathrm{BCE} + (1-\alpha) \,\mathcal{L}_\mathrm{Dice}$

where

$\mathcal{L}_\mathrm{Dice} = 1 - \frac{2 \sum_{ij}\hat{y}_{ij}y_{ij} + \epsilon}{\sum_{ij}\hat{y}_{ij} + \sum_{ij}y_{ij} + \epsilon}$

Multitask approaches combine segmentation and auxiliary task losses:

$L_\mathrm{total}(\theta) = \lambda_1 L_D(\hat{y}_D, y_D) + \lambda_2 L_{LC}(\hat{y}_{LC}, y_{LC})$

where $L_{LC}$ is the multi-class categorical cross-entropy for land cover (Arnaudo et al., 2023).

For wound detection, focal loss and smooth $L_1$ losses are used to optimize box classification and regression, respectively (Scebba et al., 2021). BAM-derived segmentation does not involve a learnable segmentation loss, but rather leverages post-hoc thresholding and correlation maximization with Grad-CAM for mask selection (Abdolahnejad et al., 2023).

4. Evaluation Metrics and Quantitative Results

Across studies, Intersection over Union (IoU), F1-score, accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC) are standard:

IoU/Jaccard: $\frac{TP}{TP + FP + FN}$
F1-score: $2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$
MCC: $\frac{TP \cdot TN - FP \cdot FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$

AmazonNET reported validation accuracy of 63.33%, noting that success cases included recovery of weak positive fragments and suppression of spurious false positives; limitations were apparent in distinguishing scars from hydrological features in noisy labeling conditions (Mohla et al., 2020).

Multitask Sentinel-2 models (without pretraining) realized substantial gains: average F1 improved by +3.85 points and IoU by +5.71 points over single-task baselines, with the best multitask model attaining 91.86 F1 and 84.94 IoU on held-out events (Arnaudo et al., 2023).

In DS, segmentation MCC improved from 0.29 (full image) to 0.85 (with detection), with out-of-distribution datasets sustaining high MCC (0.74–0.90) and IoU (0.60–0.83). The two-stage pipeline enabled effective learning from as little as 10% of the conventional training set size (Scebba et al., 2021).

BAM achieved an average F1 of 0.78 for burn severity classification. For segmentation, alignment with clinician-labeled masks on the LDI dataset yielded 93.33% accuracy, 65.62% Jaccard, and sensitivity/specificity of 75.20%/96.47%. BAM also outperformed basic Grad-CAM thresholding for both Jaccard (65.62% vs 37.20%) and sensitivity (75.20% vs 44.18%) (Abdolahnejad et al., 2023).

5. Dealing with Noise, Fragmentation, and Domain Shift

Label noise and fragmentation represent significant obstacles. Mohla et al. demonstrated that a UNet trained on noisy, weakly-labeled data could, through the use of multimodal (VIS+NIR) inputs and skip connections, recover labels even where ground truth omitted scars or misattributed non-burned regions. The continuity propagated via skip links enabled recovery of fine, fragmented features typical of Amazonian burn patterns (Mohla et al., 2020).

Largilliere et al. found that auxiliary land-cover supervision, even with imperfect label maps, substantially stabilized the burn-scar segmentation problem. The multitask setup consistently reduced variance and improved segmentation fidelity, particularly without pretrained weights. By learning land cover semantics, the model disambiguated confounders (e.g., bare soil vs. burned area, building roofs vs. charred ground), sharpening predicted boundaries (Arnaudo et al., 2023).

In clinical image analysis, DS’s two-stage pipeline reduced heterogeneity by cropping to the lesion, filtering confounding backgrounds (dressings, hair), and enforcing balanced scar/background ratios, which was critical for both in- and out-of-distribution generalization (Scebba et al., 2021). For subtle or low-contrast scar boundaries, methods such as BAM—leveraging channel selection for high-resolution attention mapping—allowed for boundary delineation superior to standard classifier-class-activation-maps, particularly after histogram-based (GMM) threshold selection and morphological post-processing (Abdolahnejad et al., 2023).

6. Extensions, Limitations, and Future Directions

Significant limitations remain. Weakly labeled or noisy datasets, common in both environmental and clinical settings, necessitate development of robust loss functions (e.g., generalized cross-entropy, robust Dice), as suggested for remote-sensing applications (Mohla et al., 2020). Saliency-guided methods such as BAM depend on coarse class activation maps having sufficient overlap with the lesion; otherwise, segmentation cannot recover omitted regions (Abdolahnejad et al., 2023).

Scaling multitask frameworks relies on availability and quality of auxiliary (e.g., land-cover) labels. Domain adaptation and semi/self-supervised pretraining on large, unlabeled datasets (e.g., Sentinel-2 time series) are proposed as effective, scalable strategies (Arnaudo et al., 2023).

Several architectural and algorithmic improvements have been advanced for burn-scar applications:

Attention mechanisms on skip connections (e.g., Attention U-Net) for focused propagation of spatial detail (Scebba et al., 2021).
Dual-path encoders for multi-modal or multi-feature fusion (e.g., RGB+texture via LBP/HOG) (Scebba et al., 2021).
Texture consistency and color-gradient losses for enforcing smoothness and boundary alignment (Scebba et al., 2021).
Multi-phase, multi-modal, or temporal data integration for improved discrimination between scar and non-scar tissue (Abdolahnejad et al., 2023).

Suggested research directions include: large-scale self-supervised pretraining, domain adaptation to global or rare-ecosystem fire regimes, expansion of annotated burn-scar corpora through active or semi-supervised learning, and incorporation of temporal context for healing trajectory segmentation (Arnaudo et al., 2023, Abdolahnejad et al., 2023).

7. Comparative Table of Key Methodologies

Paper/Model	Data/Input	Architecture	Core Loss	Highlights
AmazonNET (Mohla et al., 2020)	Landsat 8 (RGB+NIR)	Classic 4-ch U-Net	Binary cross-entropy	Robust to weak/noisy labels
Multitask Sentinel-2 (Arnaudo et al., 2023)	Sentinel-2 (12 bands)	UPerNet, SegFormer (MTL)	BCE + Categorical	Auxiliary land cover stabilizes
Detect-and-Segment (DS) (Scebba et al., 2021)	RGB clinical/biomed	2-stage: Detector + U-Net	BCE + Dice	Out-of-distribution robustness
BAM (Abdolahnejad et al., 2023)	RGB burn images, LDI	EffNet-B7 + Grad-CAM + BAM	CatCross w/ postproc	Saliency-guided, high boundary acc.

The methodologies for burn scar segmentation encompass both remote-sensing and clinical imaging, with state-of-the-art solutions characterized by neural encoder–decoder nets, auxiliary task regularization, two-stage detect-and-segment frameworks, and saliency-based fine-grained attention mapping. The ongoing evolution of datasets, loss functions, and deep architectures continues to drive advances in label fidelity, boundary precision, robustness to noise, and efficient domain adaptation.

PDF Markdown Chat (Pro)

References (4)

Multimodal Noisy Segmentation based fragmented burn scars identification in Amazon Rainforest (2020)

Robust Burned Area Delineation through Multitask Learning (2023)

Boundary Attention Mapping (BAM): Fine-grained saliency maps for segmentation of Burn Injuries (2023)

Detect-and-Segment: a Deep Learning Approach to Automate Wound Image Segmentation (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Burn Scar Segmentation.

Burn Scar Segmentation: Methods and Challenges

1. Datasets and Annotation Practices

2. Segmentation Architectures

3. Loss Functions and Optimization

4. Evaluation Metrics and Quantitative Results

5. Dealing with Noise, Fragmentation, and Domain Shift

6. Extensions, Limitations, and Future Directions

7. Comparative Table of Key Methodologies

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Burn Scar Segmentation: Methods and Challenges

1. Datasets and Annotation Practices

2. Segmentation Architectures

3. Loss Functions and Optimization

4. Evaluation Metrics and Quantitative Results

5. Dealing with Noise, Fragmentation, and Domain Shift

6. Extensions, Limitations, and Future Directions

7. Comparative Table of Key Methodologies

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research