Aerial Pile Burn Detection with UAV Imagery
- The paper introduces deep learning pipelines using a compact Xception CNN and a U-Net model to achieve frame-level fire detection and pixel-wise segmentation.
- The dataset is structured into classification and segmentation subsets with co-registered RGB and thermal imagery, offering rigorous evaluation metrics such as 76% test accuracy and 92% precision.
- Integrating these methods supports real-time fire alerts, geolocation of burning regions, and post-event forensic analysis to enhance wildfire management strategies.
Aerial imagery pile burn detection refers to the automated identification and delineation of active pile burns from aerial platforms using computer vision, leveraging both RGB and infrared sensor modalities. Such approaches facilitate both real-time fire monitoring and post-event analysis by detecting flames and segmenting fire regions in imagery collected from unmanned aerial vehicles (UAVs). Central to recent advances is the FLAME (Fire Luminosity Airborne-based Machine learning Evaluation) dataset, which provides co-registered RGB and thermal imagery of prescribed pile burns, supporting the development and benchmarking of deep learning pipelines for fire detection and segmentation (Shamsoshoara et al., 2020).
1. Data Acquisition and Dataset Structure
Aerial pile burn data collection in the FLAME dataset employed multirotor UAVs equipped with both high-resolution RGB and thermal imaging sensors. Two drone platforms were used: the DJI Matrice 200 with a Zenmuse X4S RGB camera (20 MP CMOS, 84° FOV, up to 4K) and the DJI Phantom 3 Professional with a 12.4 MP CMOS RGB camera (94° FOV). Thermal imaging was implemented using the FLIR Vue Pro R sensor (640×512 thermal pixels, 45° FOV), providing output in “Fusion,” “WhiteHot,” and “GreenHot” palettes.
Flight campaigns were conducted during prescribed pile burns in an Arizona pine forest (Observatory Mesa, AZ, January 16, 2020) under controlled meteorological conditions (6 °C, partly cloudy, no wind) and involved varying altitudes and top-down to oblique viewing geometries. RGB video resolutions included 1280×720 at 29 FPS (Zenmuse) and 3840×2160 at 30 FPS (Phantom), while thermal sequences were captured at 640×512 at 30 FPS (FLIR).
The dataset is organized into two core subsets:
| Subset | Frame Count | Resolution and Modality | Annotation Type |
|---|---|---|---|
| Classification | 39,375 train/val | 254×254 RGB (JPEG) | Frame-wise Fire/No-Fire |
| 8,617 test | 254×254 RGB | ||
| Segmentation | 2,003 frames | 3480×2160 RGB, thermal | Per-pixel fire masks |
Annotation procedures included frame-wise binary labels (assigned according to visible flame motion) and segmentation masks drawn by human experts using MATLAB Image Labeler. The segmentation set contains 2,003 full-resolution RGB frames with corresponding polygonal binary masks (totaling ≈23.4 MB). The classification set is split into 25,018 Fire and 14,357 No-Fire frames for train/val, and 5,137 Fire and 3,480 No-Fire frames for testing.
2. Frame-Level Binary Classification Pipeline
The FLAME benchmark for binary classification formulates aerial pile burn detection as a two-class image recognition task—assigning each RGB frame a Fire or No-Fire label.
The adopted model design is a compact Xception-derived convolutional neural network tailored for low-latency inference on edge hardware. Input images (254×254×3 RGB) are normalized to [0, 1]. The network architecture comprises:
- Entry flow: Two Conv2D layers (8 filters, 2×2 stride), BatchNorm, ReLU activations.
- Middle and exit flow: Three depthwise-separable convolutional blocks with skip/residual connections, each followed by BatchNorm and ReLU.
- Output: Single neuron with Sigmoid activation, , representing class probability.
Training uses the Adam optimizer (learning rate 0.001), binary cross-entropy loss:
Batch size is 32, run for up to 40 epochs with data augmentation (horizontal flips, random rotations).
Performance metrics follow standard definitions:
- Accuracy =
- Precision =
- Recall =
- F1-score =
Empirically, the classifier achieved a training accuracy of 96.8%, validation accuracy of 94.3%, and test accuracy of 76.2%. High test-set error is associated with class imbalance (overrepresentation of the Fire class), high visual similarity between smoke/charred ground and flames, and failure in difficult illumination conditions. The system’s lightweight footprint enables real-time deployment onboard edge GPUs.
3. Pixel-Level Segmentation Architecture
For precise delineation of fire extents within each frame, a customized U-Net convolutional network is used, addressing the fire segmentation problem as dense binary classification for each pixel.
Key elements of the U-Net configuration for FLAME:
- Input: 512×512×3 RGB, normalized to [0, 1].
- Encoder: Four blocks (Conv2D + ELU; Dropout; Conv2D + ELU; MaxPool2D).
- Bottleneck: Two Conv2D + ELU layers with dropouts.
- Decoder: Four up-convolution (Conv2DTranspose) blocks, each concatenated with corresponding encoder features, followed by two Conv2D + ELU + dropout layers.
- Output: 1×1 Conv2D + Sigmoid, producing per-pixel Fire/Background scores.
Loss function is standard binary cross-entropy, with optional inclusion of Dice loss:
Training is conducted with Adam (lr=0.001), batch size 16, up to 30 epochs, employing early stopping. The data split allocates 85% for training and 15% for validation from the 2,003 annotated frames.
Segmentation results:
- Precision: 91.99%
- Recall: 83.88%
- F1-score: ≈87.75%
- Intersection over Union (IoU): 78.17%
Qualitative evaluation shows strong capture of dynamic flame shapes but occasional under-segmentation at the periphery in low-contrast scenes. High average precision reflects low false alarm rates; moderate recall indicates occasional under-detection of thin or low-intensity flame regions.
4. Applications in Operational Fire Management
Automated pile burn detection from aerial imagery via FLAME’s approaches supports both real-time and retrospective fire management:
- Onboard real-time alerting: The small Xception classifier enables candidate-frame filtering directly on UAV edge processors, allowing for prompt flagging of suspected fire frames for human review or secondary algorithmic analysis.
- Geolocation and tasking: Binary segmentation masks facilitate rapid geolocation of active burning regions, enabling targeted dispatch of ground personnel or dynamic retasking of UAVs.
- Post-event forensic analysis: High-resolution fire masks support precise area-burned calculations, temporal tracking of pile fire growth, and refined estimates of fuel consumption.
A plausible implication is that integrating such pixel-level fire masks into incident management protocols could streamline resource allocation and improve firefighter safety.
5. Methodological Challenges and Prospects for Improvement
Observed limitations in the existing pipelines include moderate binary classification accuracy (76%) driven by class imbalance and high visual ambiguity between flame, smoke, and background features. Key findings indicate that some “Fire” frames are misclassified due to minimal or low-contrast flames; in segmentation, boundary accuracy is constrained for thin or transparent flame edges.
Proposed enhancements, as outlined in the data:
- Expanding the training set with more heterogeneous backgrounds and lighting.
- Employing focal or class-weighted loss functions to better address class imbalance.
- Integrating thermal sensor data (multi-modal fusion) to exploit heat signatures for improved detection under adverse optical conditions (e.g., dense smoke, twilight).
- Temporal modeling via recurrent modules or 3D CNNs to utilize flame flicker and motion patterns, potentially improving frame-level detection reliability.
This suggests that future aerial wildfire detection frameworks may converge towards multi-modal, temporally-aware architectures trained on increasingly diverse fire imagery.
6. Dataset and Benchmark Implications
The FLAME dataset is unique in simultaneously offering co-registered RGB and thermal aerial imagery of prescribed pile burns, facilitating both fire detection (frame-wise) and segmentation (pixel-wise) tasks. It serves as a primary resource for developing and benchmarking deep learning techniques for wildfire monitoring, supporting transfer learning for broader fire scenarios (e.g., broadcast-range wildfires). Its comprehensive annotations, multi-modal coverage, and high spatial resolution frame the current benchmark for aerial fire detection and segmentation (Shamsoshoara et al., 2020).
FLAME’s approaches—combining a low-latency Xception-based classifier and a U-Net-based segmentation model—achieve demonstrated performance of 76% accuracy in frame-level detection and approximately 92% precision in mask segmentation. These results establish a solid foundation for deploying robust aerial fire monitoring systems and direct future research towards data augmentation, multi-modal fusion, and temporal modeling strategies.