EarthLink: A Self-Evolving AI Agent for Climate Science (2507.17311v2)

Published 23 Jul 2025 in cs.LG, cs.AI, and physics.ao-ph

Abstract: Modern Earth science is at an inflection point. The vast, fragmented, and complex nature of Earth system data, coupled with increasingly sophisticated analytical demands, creates a significant bottleneck for rapid scientific discovery. Here we introduce EarthLink, the first AI agent designed as an interactive copilot for Earth scientists. It automates the end-to-end research workflow, from planning and code generation to multi-scenario analysis. Unlike static diagnostic tools, EarthLink can learn from user interaction, continuously refining its capabilities through a dynamic feedback loop. We validated its performance on a number of core scientific tasks of climate change, ranging from model-observation comparisons to the diagnosis of complex phenomena. In a multi-expert evaluation, EarthLink produced scientifically sound analyses and demonstrated an analytical competency that was rated as comparable to specific aspects of a human junior researcher's workflow. Additionally, its transparent, auditable workflows and natural language interface empower scientists to shift from laborious manual execution to strategic oversight and hypothesis generation. EarthLink marks a pivotal step towards an efficient, trustworthy, and collaborative paradigm for Earth system research in an era of accelerating global change. The system is accessible at our website https://earthlink.intern-ai.org.cn.

Summary

The paper introduces an end-to-end AI platform that automates climate analysis by integrating planning, code generation, and multi-scenario synthesis.
It employs a multi-agent architecture with continuous feedback to refine workflows, achieving performance comparable to a junior researcher.
The system demonstrates robust accuracy across diagnostic and complex tasks while emphasizing transparency, human oversight, and continuous improvement.

EarthLink: A Self-Evolving AI Agent for Climate Science

Introduction and Motivation

The exponential growth of Earth system data, particularly from CMIP6 and related observational archives, has outpaced the capacity of traditional, manual, and often fragmented analysis workflows in climate science. Existing diagnostic toolkits (e.g., ESMValTool, PCMDI metrics) provide standardized, reproducible analyses but are inherently rigid, requiring significant programming expertise for adaptation to novel scientific questions. The emergence of LLM-driven agents offers a new paradigm for automating and augmenting scientific workflows, but prior efforts in Earth sciences have been limited to domain-specific QA or tool integration, lacking full end-to-end automation and adaptability.

EarthLink is introduced as a multi-agent, LLM-driven platform designed to function as an interactive, self-evolving research copilot for climate science. It automates the entire research workflow—from planning and code generation to multi-scenario analysis and synthesis—while maintaining transparency and auditability. The system is architected to continuously improve through user interaction and feedback, with the explicit goal of transforming scientists from manual executors to strategic supervisors.

System Architecture and Workflow

EarthLink is structured into three core modules: the Planning Module, the Self-Evolving Scientific Lab, and the Multi-Scenario Analysis Module. Each module is underpinned by a resource library comprising a Knowledge Library (curated workflows, literature, expert knowledge), a Data Library (CMIP6, obs4MIPs, and other observational datasets), and a Tool Library (ESMValTool, PCMDI metrics, CDO, xarray, Iris, etc.).

The workflow is initiated by parsing user queries or scientific documents, extracting relevant concepts and goals, and generating candidate analysis workflows via planning agents. These workflows are iteratively refined with human oversight to ensure scientific rigor and alignment.

Figure 1: The EarthLink platform workflow for automated climate data analysis, illustrating intelligent planning, self-evolving laboratory execution, and multi-scenario synthesis.

The selected plan is then transformed into executable code, which autonomously manages data retrieval, preprocessing, analysis, and visualization. The system incorporates an autonomous feedback loop for error correction and output refinement, with successful scripts and workflows contributing back to the Knowledge and Tool Libraries, thus enabling self-evolution. The final stage synthesizes computational outputs and visualizations into structured, human-readable reports, providing scientific interpretations across domains and supporting policy-relevant insights.

Benchmarking and Evaluation

A hierarchical, multi-level evaluation framework was established to assess EarthLink's scientific capabilities, spanning from basic statistical analysis to complex, open-ended research tasks:

Level 1: Simple statistical analysis (e.g., climatological means, variability, model-observation comparison)
Level 2: Mechanistic diagnosis (e.g., ECS, TCR estimation)
Level 3: Complex scientific reasoning (e.g., ENSO diversity and periodicity analysis)
Level 4: Semi-open scientific problems (e.g., constrained future projections)
Level 5: Fully open scientific problems (not attempted in this paper)

EarthLink demonstrated robust performance across all levels attempted. In Level 1, it accurately executed standard diagnostic tasks, producing results and visualizations consistent with established literature. In Level 2, it correctly identified necessary experiments and applied appropriate statistical methods for ECS and TCR estimation, with outputs matching IPCC AR6 ranges. Notably, when prompted for alternative methods, EarthLink exhibited physical intuition by adopting simplified, literature-grounded approaches.

At Level 3, EarthLink successfully decomposed complex phenomena such as ENSO diversity, implementing established classification methods and generating custom code for periodicity analysis, demonstrating emergent chain-of-thought reasoning.

Figure 2: Multi-level evaluation of EarthLink on core climate analysis tasks, including statistical feature comparison, mechanistic diagnosis, physical process diagnosis, and a differentiated task scorecard.

A formal multi-expert review (five independent climate scientists) scored EarthLink's outputs on experimental planning, code correctness, and visualization quality. Of 36 benchmark tasks, 16 achieved a score ≥4/5, deemed practically useful and comparable to a junior researcher's workflow. The strongest attribute was strategic planning, followed by code generation and visualization.

Open-Ended and Future-Oriented Applications

EarthLink was further evaluated on open-ended tasks, such as climate change detection, attribution, and future projections, where ground truth is unavailable. The system correctly distinguished between natural and anthropogenic forcings, processed multi-model CMIP6 simulations under various scenarios, and visualized ensemble means with inter-model spread.

For constrained regional projections (e.g., city-level temperatures under SSP2-4.5 for 2041–2060), EarthLink autonomously selected and implemented both hierarchical emergent constraints (HEC) and spatial aggregation methods, reducing projection uncertainty and refining risk assessments. The HEC code was generated de novo, consistent with literature-derived formulas and expert-developed scripts.

Figure 3: Application of EarthLink to open-ended climate research challenges, including detection/attribution, constrained projections, and sectoral impact synthesis.

EarthLink also demonstrated the ability to bridge quantitative projections with qualitative, policy-relevant narratives, generating sector-specific risk assessments for agriculture, energy, insurance, and environment. This cross-domain synthesis is achieved via a dynamic multi-agent system, where a "chair" agent coordinates domain-specific sub-expert agents to produce structured, multi-faceted reports.

Implementation Details

EarthLink leverages GPT-4.1 and o4-mini as foundation LLMs, with each module tailored for its specific function. The Planning Module employs OCR (MinerU) for document parsing, vector-based knowledge retrieval, and stochastic sampling for plan diversity. The Self-Evolving Lab orchestrates data preprocessing (via ESMValTool), code generation (retrieval-augmented, template-based), and iterative debugging with feedback from validation agents. The Multi-Scenario Analysis Module utilizes LLM-based image interpretation and report synthesis.

The Data Library currently exceeds 1.5 PB, covering 33 CMIP6 experiments, >70 models, and a comprehensive suite of observational datasets. The Tool Library supports seamless integration of community-vetted routines and expert-validated scripts, with continuous expansion.

Performance, Limitations, and Trade-offs

EarthLink's performance is characterized by:

High accuracy in standard diagnostics and mechanistic tasks
Emergent reasoning in complex, multi-step analyses
Transparent, auditable workflows with all intermediate scripts and outputs exposed
Rapid iteration and self-improvement via feedback and expert validation

However, the system's reasoning is fundamentally interpolative, synthesizing existing knowledge and methods rather than generating novel physical theories. Its proficiency is contingent on the quality of its knowledge base and the specificity of user prompts. A key risk is the generation of "plausibly wrong" outputs—syntactically correct code yielding scientifically incorrect results due to subtle misinterpretation. Transparent workflows and human oversight are thus essential for trustworthy deployment.

Resource requirements are substantial, particularly for storage and compute (due to the scale of CMIP6 and the need for concurrent, multi-agent execution). Scaling considerations include distributed data access, parallel task execution, and efficient caching of validated workflows.

Implications and Future Directions

EarthLink represents a significant advance in the automation and augmentation of climate science workflows. Its composable, modular architecture enables flexible orchestration of established toolkits, transforming monolithic programs into callable, interoperable components. This approach fosters a sustainable, community-driven ecosystem, where reliability and agility are synergistically combined.

The platform's natural language interface and cross-domain synthesis capabilities have the potential to break down data silos, harmonize heterogeneous datasets, and accelerate hypothesis-driven research. As EarthLink accumulates successful cross-domain workflows, it builds an internal semantic map of the climate data ecosystem, progressively enhancing its harmonization and reasoning efficiency.

Future developments should focus on:

Expanding the knowledge and tool libraries with community contributions
Integrating specialized impact models for quantitative risk assessment
Enhancing robustness against "plausibly wrong" outputs via advanced validation and uncertainty quantification
Extending to fully open scientific problems, enabling autonomous literature integration, experimental design, and frontier discovery

Conclusion

EarthLink establishes a new paradigm for AI-augmented climate science, automating the end-to-end research workflow while maintaining transparency, flexibility, and continuous self-improvement. Its demonstrated competency across a spectrum of tasks, from standard diagnostics to open-ended, policy-relevant synthesis, positions it as a valuable copilot for accelerating scientific discovery and supporting informed decision-making in the face of global change. The system's limitations underscore the necessity of human–AI partnership, with transparent, auditable workflows as a prerequisite for trustworthy scientific AI. The long-term vision is a global, open, and continuously learning platform that empowers the climate science community to address the challenges of a rapidly changing planet.

PDF Markdown

Follow-up Questions

Related Papers

Authors (17)

First 10 authors:

Tweets

https://twitter.com/OceanicPhysics/status/1948697072221233197

YouTube

Show All Videos

alphaXiv

EarthLink: A Self-Evolving AI Agent for Climate Science (12 likes, 0 questions)