Papers
Topics
Authors
Recent
2000 character limit reached

Extreme Weather Expert (EWE)

Updated 3 December 2025
  • Extreme Weather Expert (EWE) is an intelligent agent framework that automates comprehensive diagnostic analyses of extreme weather events using knowledge-guided planning and multimodal visualization.
  • It integrates a specialized meteorological toolkit with closed-loop reasoning and auditing mechanisms to ensure scalable, physically consistent diagnostic workflows.
  • EWE addresses analytical bottlenecks in high-dimensional meteorological datasets, enabling rapid, reproducible post-event assessments critical for disaster mitigation.

The Extreme Weather Expert (EWE) is an intelligent agent framework designed for comprehensive, automated diagnostic analysis of extreme weather events. EWE integrates knowledge-guided planning, closed-loop reasoning, and a specialized meteorological toolkit, enabling it to emulate the workflows of expert meteorologists at scale. Driven by the imperative to address the rising frequency and impact of extreme weather under climate change, EWE tackles the analytical bottlenecks and complexities inherent to high-dimensional meteorological datasets and physical interpretation. This agentic system not only generates and interprets multimodal visualizations but also executes stepwise analytical plans, providing rigorous, physically-grounded diagnostics across a benchmark suite of high-impact events (Jiang et al., 26 Nov 2025).

1. Motivation and Analytical Bottleneck

The escalation in global risk from cyclones, heatwaves, floods, and related phenomena has exposed the inadequacy of conventional, human-centered post-event diagnostic workflows. Such diagnostics—encompassing identification of synoptic and mesoscale drivers, quantification of anomalies, and causal interpretation—require substantial domain expertise and are time-intensive. While recent advances have enabled machine-learning models to excel at numerical weather prediction (e.g., GraphCast, Pangu-Weather), these systems are not designed for the causal, physical reasoning necessary for posthoc scientific understanding and attribution. EWE was conceived to eradicate the expert-driven bottleneck and facilitate scalable, interpretable diagnostics for both research and real-world applications—especially critical for disaster mitigation in developing regions lacking meteorological specialist capacity.

2. Agentic Architecture and Component Design

2.1 Knowledge-Guided Planning

EWE formalizes the diagnostic workflow as a Partially Observable Markov Decision Process (POMDP):

  • State Space (SS): Latent expert context, memory bank MkM_k, and completed analyses
  • Action Space (AA): Meteorological toolkit invocation, code generation, visualization production
  • Observation Space (OO): Numerical outputs, figures, NetCDF slices
  • Transition Model (TT): Deterministic (code execution) with stochastic data queries
  • Observation Model (ZZ): Mapping code/tool output to data or images as agent observations
  • Reward (RR): Stepwise scalar score based on code correctness, visualization clarity, and depth of physical interpretation

The planning objective is to maximize expected cumulative reward via policies π\pi:

π=arg maxπE[k=1Nγk1rk],    rk=E(e,ak,ok,ik)\pi^* = \argmax_\pi \mathbb{E} \left[ \sum_{k=1}^N \gamma^{k-1} r_k \right],\;\; r_k = \mathcal{E}(e, a_k, o_k, i_k)

A directed acyclic planning graph GplanG_\text{plan} decomposes the event analysis into expert-defined subtasks (e.g., "Identify synoptic trough", "Compute vorticity"), with logical dependencies seeded by Chain-of-Thought exemplars.

2.2 Self-Evolving Closed-Loop Reasoning

The agent iteratively cycles through: Thought (tt), Action (aa), Observation (oo), Interpretation (ii), with two auditors (CodeAuditor and FigureAuditor) enforcing procedural and perceptual correctness.

Pseudocode for the main loop:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Initialize event context e, plan P = Plan(e)
Initialize memory M, step k = 1
while k  N_max and P not empty:
    t_k  LLM.think(M, P.current_subtask)
    a_k  generate_code(t_k)
    try:
        o_raw  execute(a_k)
    except ExecutionError as err:
        a_k  CodeAuditor.debug(a_k, err)
        o_raw  execute(a_k)
    o_k  FigureAuditor.refine(o_raw)
    i_k  LLM.interpret(M, a_k, o_k)
    feedback  {execution success, auditor flags}
    if feedback indicates subtask success:
        M  Memory.update(M, a_k, o_k, i_k)
        P.advance()
    else:
        P.revise_current()
    k  k + 1
report  concatenate all i_k into final diagnostic
return report

Stopping criteria include completion of all subtasks or exceeding the maximum allowed steps.

2.3 Domain-Tailored Meteorological Toolkit

The toolkit exclusively operates on NetCDF-formatted ERA5 reanalysis data and a 30-year climatology baseline:

  • Data acquisition: Spatiotemporal data fields via variable-specific queries
  • Anomaly detection: Computation of absolute and standardized anomalies
  • Front identification: Temperature gradient methods at 850 hPa
  • Vorticity (ζ\zeta) and potential vorticity (PV):

ζ=vxuy,    PV=(ζ+f)(θp)ρ,    θ=T(p0p)R/cp\zeta = \frac{\partial v}{\partial x} - \frac{\partial u}{\partial y},\;\; \mathrm{PV} = \frac{(\zeta + f)\left(\frac{\partial \theta}{\partial p}\right)}{\rho},\;\; \theta = T\left(\frac{p_0}{p}\right)^{R/c_p}

  • Integrated Vapor Transport (IVT):

IVT=1gptoppbotqudp\mathrm{IVT} = \frac{1}{g} \int_{p_{\text{top}}}^{p_{\text{bot}}} q\,\mathbf{u} dp

  • Moisture and Q-Vector diagnostics:

Q=Rpσ(Tu),    (qu)\mathbf{Q} = -\frac{R p}{\sigma} \left(\nabla T \cdot \nabla \mathbf{u}\right),\;\; \nabla \cdot (q \mathbf{u})

All diagnostic outputs feed into multimodal visualization routines for interpretive reporting.

3. Multimodal Visualization and Interpretation

EWE supports a broad suite of visualization routines tailored to classic meteorological analysis:

  • Data subsetting and smoothing: Gaussian filtering, regional extraction
  • Contour mapping: 500 hPa geopotential heights, filled anomaly maps
  • Vector field overlays: Wind arrows with auto-downsampling
  • Overlays: Vorticity, Q-vector, IVT plumes, temperature/precipitation anomalies
  • Cross-sections and vertical profiles: Thermodynamic diagrams along storm tracks

An MLLM reads visual outputs and diagnostic figures, parsing captions, units, and identifying key meteorological patterns (e.g., "southward-dipping trough axis, enhanced moisture advection").

4. Benchmark Dataset and Evaluation Metrics

4.1 Curated Benchmark Suite

The benchmark includes 103 high-impact events drawn from EM-DAT and WMO reports, encompassing heatwaves, cold waves, extreme precipitation, droughts, and storms across all major continents. Each event bundle contains time-tracked NetCDF fields (T, u, v, q, geopotential height, surface metrics) and climatology.

4.2 Stepwise Evaluation and Automated Scoring

For each analysis step kk, the reward function:

rk=αFcode(ak)+βFviz(ok)+γFphys(ik)r_k = \alpha\,\mathrm{F}_{\rm code}(a_k) + \beta\,\mathrm{F}_{\rm viz}(o_k) + \gamma\,\mathrm{F}_{\rm phys}(i_k)

with code quality, visualization clarity, and physical interpretation measured by MLLMs (e.g., GPT-4.1, Gemini-2.5-Pro). Overall event scores average across all trajectory steps:

Rtotal=1Nk=1NrkR_{\rm total} = \frac{1}{N}\sum_{k=1}^{N} r_k

Experimental ablations demonstrate the criticality of the meteorological toolkit, figure/code auditing routines, and chain-of-thought planning in sustaining stepwise diagnostic quality (Table and Figs. in (Jiang et al., 26 Nov 2025)).

5. Experimental Results and Case Studies

Across seven diagnostic stages (planning, data exploration, event identification, synoptic analysis, mesoscale, thermodynamic, final report), comparative grading protocols reliably distinguish model and pipeline variants. For example, Claude-4-Sonnet achieves 0.95 mean event report score; omitting the toolkit decreases thermodynamic analysis scores by 20 %; eliminating chain-of-thought collapses diagnostic fidelity (0.548 synoptic, 0.467 mesoscale) (Jiang et al., 26 Nov 2025).

Case study (extreme precipitation):

  • Qualification: Code computes 95th percentile anomaly; filled-contour map visualizes persistence; interpretation links to event threshold.
  • Synoptic analysis: Automated plan produces geopotential height and wind vector maps, interpreted as mid-tropospheric trough transport.
  • Thermodynamic diagnosis: Computed θ\theta and IVT support physically consistent attribution of convective potential.

6. Broader Impacts, Limitations, and Future Directions

EWE provides scalable, automated diagnostic reasoning for dozens of events, potentially democratizing meteorological expertise for under-resourced regions and real-time disaster preparedness. Applications encompass post-event attribution, rapid academic case study drafts, and meteorological training with traceable reasoning chains. Limiting factors include LLM hallucination susceptibility, data latency (reanalysis-only ingestion), and event-type coverage restricted to benchmark categories. Future roadmap involves integration with physics-based models (WRF), remote-sensing data streams, and active learning from human corrections.

7. Significance and Position in the Field

EWE is distinguished by its explicit agentic design: knowledge-guided, stepwise planning over subtasks, multimodal expert-grade visualization, and closed-loop reasoning with automated auditing. The benchmark protocol and structured evaluation deliver transparent, interoperable performance assessment, establishing EWE as a reference architecture for subsequent automated diagnostic tools in extreme weather research (Jiang et al., 26 Nov 2025). By decoupling expert workflows from human bottlenecks, EWE advances the field toward real-time, reproducible, physically consistent diagnostic analytics and augments the global capacity to understand and respond to extreme weather hazards.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Extreme Weather Expert (EWE).