Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Link Climate: An Interoperable Knowledge Graph Platform for Climate Data (2210.16050v1)

Published 28 Oct 2022 in cs.DB

Abstract: Climate science has become more ambitious in recent years as global awareness about the environment has grown. To better understand climate, historical climate (e.g. archived meteorological variables such as temperature, wind, water, etc.) and climate-related data (e.g. geographical features and human activities) are widely used by today's climate research to derive models for an explainable climate change and its effects. However, such data sources are often dispersed across a multitude of disconnected data silos on the Web. Moreover, there is a lack of advanced climate data platforms to enable multi-source heterogeneous climate data analysis, therefore, researchers must face a stern challenge in collecting and analyzing multi-source data. In this paper, we address this problem by proposing a climate knowledge graph for the integration of multiple climate data and other data sources into one service, leveraging Web technologies (e.g. HTTP) for multi-source climate data analysis. The proposed knowledge graph is primarily composed of data from the National Oceanic and Atmospheric Administration's daily climate summaries, OpenStreetMap, and Wikidata, and it supports joint data queries on these widely used databases. This paper shows, with a use case in Ireland and the United Kingdom, how climate researchers could benefit from this platform as it allows them to easily integrate datasets from different domains and geographical locations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiantao Wu (16 papers)
  2. Fabrizio Orlandi (12 papers)
  3. Declan O'Sullivan (14 papers)
  4. Soumyabrata Dev (86 papers)
Citations (23)

Summary

This paper introduces Link Climate, a platform designed to address the challenge of integrating dispersed and heterogeneous climate and climate-related data sources (Wu et al., 2022 ). Climate researchers often struggle to collect and analyze data from various silos like meteorological observations, geographical features, and human activity records. Link Climate proposes a Knowledge Graph (KG) approach using Semantic Web technologies to unify these diverse sources into a single, queryable service.

Core Idea & Implementation:

  1. Knowledge Graph Construction: The platform builds a KG primarily using:
    • NOAA Climate Data: Daily climate summaries (temperature, precipitation, etc.) retrieved via NOAA's REST APIs.
    • OpenStreetMap (OSM): Geographical information about weather station locations (counties, cities). OSM's reverse geocoding API is used to find administrative areas based on station coordinates.
    • Wikidata: Encyclopedic knowledge linked via OSM entities to provide richer context (e.g., nearby water bodies, geographical features).
  2. Ontology-Driven Structure: The KG's structure is defined by the Climate Analysis (CA) ontology, which extends the standard SOSA/SSN (Sensor, Observation, Sample, and Actuator) ontologies. CA models concepts like datasets, data categories, data types, locations, stations, and observations, aligning them with NOAA API structures and data fields. It also reuses terms from QUDT (for units) and WGS84 (for spatial data) and aligns with the AEMET ontology (Spanish Meteorology Agency) using owl:sameAs for improved interoperability.
  3. Data Integration & Linking:
    • NOAA data is fetched using a sliding window approach (4 weeks) to capture recent updates while managing duplicates inherent in RDF triple stores.
    • Stations from NOAA data are linked to geographical entities in OSM based on their latitude/longitude coordinates.
    • OSM entities often contain Wikidata identifiers, enabling the integration of Wikidata's contextual information into the KG.
  4. Technology Stack & Publication:
    • The KG is stored in an Apache Jena Fuseki triple store.
    • Data is published following Linked Data principles, making each entity accessible via a unique, dereferenceable URI.
    • A SPARQL endpoint allows complex queries across the integrated data.
    • LodView is used for URI dereferencing, providing a user-friendly, graph-based view of the data.
    • The implementation relies on Python scripts, available on GitHub. The KG currently holds around 14 million RDF triples.

Use Case & Evaluation:

  • The paper demonstrates the platform's utility with a use case involving data from Ireland and the United Kingdom.
  • Competency questions (e.g., "Find stations in a region," "Retrieve time series for multiple variables," "Find geographical context of a station") were used to evaluate the KG's ability to answer relevant climate research questions. SPARQL queries were formulated to answer these questions, demonstrating the KG's effectiveness. The need for future GeoSPARQL integration for complex spatial queries is noted.
  • A Web Interface provides documentation (Readme) and tutorials (Beginner's Guide) to help users, especially those unfamiliar with KGs or SPARQL, explore the platform.
  • Usability testing (31 participants) using a PSSUQ-based questionnaire showed positive results regarding the platform's understandability and perceived usefulness for climate researchers. Feedback suggested improvements like embedding query interfaces and providing clearer visual outputs.

Contributions:

  • An open KG for NOAA data, enhancing explainability.
  • Integration of heterogeneous sources (climate, geographic, encyclopedic) using Linked Data.
  • Automated synchronization mechanism.
  • A Web interface for user guidance and exploration.

Future Work:

  • Incorporate user feedback to improve the web interface.
  • Extend the CA ontology to integrate more data sources (e.g., remote sensing, air pollution, NetCDF data).
  • Explore semi-automatic ontology alignment methods.
  • Implement GeoSPARQL for enhanced geospatial querying.