Papers
Topics
Authors
Recent
Search
2000 character limit reached

PhysFire-WM: Unified Fire Spread Emulation Framework

Updated 1 June 2026
  • PhysFire-WM is a unified framework integrating combustion physics and machine learning to emulate fine-grained fire spread with physical realism.
  • It employs a three-module design—physical simulator, multimodal tokenizer, and diffusion transformer—to enforce spatiotemporal constraints on fire propagation.
  • Empirical evaluations show improved fire mask accuracy and IR fidelity via cross-task collaborative training and structured physics-based priors.

The PhysFire-WM (Physics-Informed World Model) framework is a unified modeling system designed for fine-grained fire spread emulation. It integrates explicit combustion physics through simulator-driven priors with a diffusion-based transformer backbone, jointly leveraging multimodal data such as infrared imagery and fire masks. The approach combines advances from both physics-driven simulation and machine learning-based emulators, aiming to capture the inherently multi-scale and spatiotemporal dynamics of fire propagation with both physical realism and geometrically accurate boundary delineation (Zhou et al., 19 Dec 2025).

1. Core Architecture and Components

PhysFire-WM comprises three primary interconnected modules:

  1. Physical Simulator (PϕP_{\phi}): Numerically integrates a fire-energy PDE system to generate a sequence of physically-informed prior masks. Inputs include historical fire boundary masks and environmental data—terrain elevation (zz), wind velocity field (v\vec v), and fuel maps—encapsulating both deterministic and stochastic influences on fire evolution.
  2. Multimodal Tokenizer (EηE_{\eta}): Fuses diverse modalities—including IR video frames, prior masks, control masks, and user prompts—into a unified spatiotemporal context token stream. This tokenizer utilizes a combination of pretrained variational autoencoder (VAE) for IR frame compression, a convolutional encoder for mask processing, and learned embeddings for additional controls and prompts.
  3. Diffusion Transformer (GψG_{\psi}): Employs a DiT (Diffusion Transformer) backbone that denoises latent tokens, conditioned on the multimodal token stream. Outputs include both predicted infrared (thermal) sequences and fire boundary masks.

A distinctive aspect is the method for integrating physics priors as both explicit (hard constraints) and implicit (feature-level) guidance within the generative pathway. The simulator output is coupled to the diffusion process via a Video Condition Unit (VCU), enabling hard enforcement of physically plausible fireline geometry and upwind propagation patterns during denoising.

2. Physical Simulations and Structured Priors

2.1 Governing PDE and Numerical Approximation

The simulator at the core of PhysFire-WM is governed by a thermal-energy balance PDE:

cTt= ⁣ ⁣(kT)(v+γz)T+AFr(T)CΔTS(T)c\,\frac{\partial \mathcal{T}}{\partial t} = \nabla\!\cdot\!(k\,\nabla \mathcal{T}) - (\vec v + \gamma\,\nabla z)\cdot\nabla\mathcal{T} + A\,F\,r(\mathcal{T}) - C\,\Delta\mathcal{T}_{S(\mathcal{T})}

Here:

  • T(p,t)\mathcal{T}(p,t) denotes the temperature or fire boundary indicator at spatial point pp and time tt
  • v\vec v is wind velocity; zz0 is terrain height; zz1 is fuel availability
  • zz2 are physical coefficients
  • zz3 is a temperature-dependent combustion reaction term

The combustion source zz4 is parametrized as a convex combination of historical temperature fields to ensure tractability and differentiability:

zz5

This framework is solved via finite-difference discretization, providing a sequence of prior masks zz6 that encode expected boundary evolution under physical constraints.

2.2 Explicit–Implicit Conditioning

The simulator outputs are injected into the DiT as temporally stacked frames, segregated by control masks. Real IR frames are labeled with "all-zero" masks (preserving content), while prior-mask frames are associated with "all-ones" masks (enforcing physical guidance). This dual-pathway acts as both hard and soft constraints on the diffusion process, biasing sampling toward physically admissible fire front predictions.

3. Cross-task Collaborative Training (CC-Train)

The Cross-task Collaborative Training (CC-Train) strategy is central to PhysFire-WM's learning scheme. It addresses the informational sparsity of binary mask modeling (where gradients vanish in non-burning regions) by joint diffusion-based prediction of IR frames and fire masks. Key mechanisms include:

  • Parameter Sharing: Both IR and mask prediction tasks share the encoder, tokenizer, and transformer backbone, with a LoRA (Light-rank Adaptation) layer ensuring co-adaptation of representations.
  • Gradient Coordination: The total loss aggregates a diffusion-based IR prediction loss and a binary cross-entropy mask loss:

zz7

where IR loss is a velocity-field loss, and the mask loss is pixelwise binary cross-entropy. The weighting parameter zz8 balances thermal and geometric fidelity.

  • Gradient Borrowing: The dense and global gradients from the IR task ameliorate the vanishing gradient problem in the mask prediction, driving shared representations toward features relevant for both thermal field reconstruction and spatial delineation, even in fire-absent zones.

Ablation results demonstrate that training the mask task alone significantly underperforms the coordinated setting, underscoring the efficacy of this cross-task signal sharing (Zhou et al., 19 Dec 2025).

4. Multimodal Data Flow and Representation

PhysFire-WM is designed for heterogeneous and temporally aligned data ingestion:

  • Inputs:
    • Infrared video (zz9), normalized to v\vec v0 and compressed via a pretrained VAE (v\vec v1)
    • Binary fire masks (v\vec v2)
    • Environmental maps (v\vec v3), spatially matched to IR/mask resolution
    • Control masks and user prompts
  • Context Token Fusion:
    • Convolutional encoders project prior masks to latent tensors matching the VAE output
    • Embeddings represent control masks and prompts
    • All features are concatenated and linearly projected, forming the context token sequence (v\vec v4) for repeated cross-attention in the DiT

This architecture supports flexible conditioning, allowing the model to synthesize both physically consistent and data-driven spatiotemporal patterns.

5. Specialized Loss Functions and Training Schedule

The total loss for PhysFire-WM incorporates three terms:

v\vec v5

  • v\vec v6: DiT velocity-field loss for IR prediction
  • v\vec v7: Binary cross-entropy loss for mask prediction
  • v\vec v8: L2 regularization for convex weights in the combustion source parametrization

Training alternates mini-batches for each prediction task within each epoch, updating shared parameters per step. AdamW optimizer is used at a learning rate of v\vec v9, with LoRA rank 128 facilitating efficient adaptation on multi-GPU setups.

6. Empirical Evaluation and Performance

PhysFire-WM was validated on a drone-collected multimodal fire dataset (226 aligned IR + mask videos). Quantitative metrics for the single-region 17-in/17-out prediction benchmark showed:

Metric PhysFire-WM Value Relative Δ vs. best prior
Mask AUPRC 0.89 ↑6.8%
Mask IoU 0.89 ↑15.1%
IR PSNR [dB] 23.62 ↑3.7%
IR SSIM 0.80 ↑7.1%
LPIPS 0.09 ↓27.4%
FVD 0.001 ↓83.3%

For cross-region generalization on unseen areas, Mask IoU was 0.81, IR PSNR reached 23.26, and FVD was 0.00. Key findings from ablations include that removing the physical prior degrades IR PSNR to 22.76 dB and AUPRC to 0.82; using mask-only training (with prior) yields AUPRC 0.85, while CC-Train raises it to 0.89. Visualizations indicate that fire front shapes produced by PhysFire-WM respect upwind propagation, energy diffusion, and elliptical spread patterns, outperforming purely data-driven or physics-agnostic models (Zhou et al., 19 Dec 2025).

7. Relation to High-resolution Coupled Fire-Atmosphere Models

PhysFire-WM design is informed by principles established in coupled atmosphere–wildland fire models such as WRF-Fire (Mandel et al., 2011), which utilize level-set methods for fireline evolution, semi-empirical spread-rate closure, and explicit physical coupling between fire surface fluxes and atmospheric state. Both systems share core modeling primitives—level-set boundary representations, tile-callable parallelism, heterogeneous-fuel support, and explicit–implicit numerical schemes. However, PhysFire-WM introduces a differentiable, data-driven emulator framework capable of ingesting complex observational modalities and learning cross-domain correlations, while remaining anchored in explicit physical law through its simulator prior component.

A plausible implication is that PhysFire-WM provides a template for integrating simulation-based priors within generative ML architectures for other spatiotemporal dynamical systems, supporting high-resolution forecasting under explicit physical constraints.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PhysFire-WM Framework.