FiresRu Dataset: Russian Wildfire Data
- FiresRu is an open-access, event-centric dataset capturing 26,681 fire incidents across Russia with detailed meteorological annotations for wildfire science.
- The dataset covers diverse ecosystems—from forests to tundra—with rigorous quality control and spatiotemporal sampling across all Russian federal districts.
- FiresRu supports predictive modeling and classification tasks, offering a clear CSV schema and Python workflows for fire-risk and seasonal analysis.
FiresRu is an open-access, event-centric dataset capturing fire incidents across the Russian Federation accompanied by collocated meteorological observations over a 13-month period from June 2020 through June 2021. Designed to facilitate research in wildfire dynamics specific to the diverse Eurasian ecosystem, FiresRu provides 26,681 point records with rich categorical and meteorological annotations suitable for spatiotemporal, predictive, and classification modeling paradigms relevant to fire science applications (Kriuk, 24 Feb 2025).
1. Scope and Geographic Coverage
FiresRu v1.0 encompasses fire events sampled from all federal districts of the Russian Federation, pairing each incident with meteorological variables at the point of detection. The dataset spans an extensive longitudinal window from 20.37°E to 175.87°E and a latitudinal range from 41.71°N to 70.33°N. This coverage ensures representation of disparate landcover classes—forests, tundra, steppe, and peatlands—across European Russia, Siberia, the Far East, and the North Caucasus, facilitating ecological diversity in analysis.
Temporally, the dataset provides irregular event-based sampling: records are generated on detection of new fire incidents rather than at predetermined intervals. The temporal window extends from June 1, 2020, to June 30, 2021, with each entry providing a UTC timestamp marking the event.
2. Data Schema and Feature Composition
FiresRu is organized as a single flat CSV file, FiresRu.csv, encapsulating the following ten features for each of the 26,681 records:
| Field Name | Type | Description |
|---|---|---|
| dt | string | ISO 8601 UTC timestamp (“2020-07-15T14:00:00Z”) of fire event |
| type_name | string | Categorical fire class: “forest_fire”, “natural_fire”, “controlled_burn”, “uncontrolled_burn”, “peat_fire” |
| type_id | integer | Fire category code (0–4, mapping to type_name) |
| lon | float64 | Longitude (degrees east; range 20.37–175.87) |
| lat | float64 | Latitude (degrees north; range 41.71–70.33) |
| temperature | float32 | Air temperature (°C; [–34.88, 31.94]) |
| precipitation | float32 | Total precipitation in previous 24h (mm; [0.00, 25.85]) |
| relative_humidity | float32 | Atmospheric relative humidity (%; [0, 100]) |
| wind_speed | float32 | Wind speed at event time (m/s; [0.0, 15.0]) |
| solar_radiation | float32 | Instantaneous solar radiation (W/m²; [0, 1000]) |
No explicit remote sensing features (e.g., MODIS counts, NDVI) or composite indices (e.g., Fire Weather Index) are present in v1.0. Modelling and principal component analyses utilize only the five meteorological fields. For normalization prior to machine learning and PCA, -score standardization is employed:
with principal components constructed as linear combinations of standardized meteorological inputs (e.g., ) (Kriuk, 24 Feb 2025).
3. Data Collection, Cleaning, and Preparation
Fire event records are sourced from national incident reports and open-access fire-detection feeds; meteorological variables are aggregated from multiple public weather API providers, leveraging both reanalysis products and station measurements. Each fire record is paired with meteorological data via nearest-neighbor spatial association at the time of the event (point snapshot). Precipitation is aggregated for the preceding 24 hours.
Quality control involves:
- Removal of duplicate events—identical (dt, lon, lat, type_id).
- Outlier filtering: eliminating records with temperature less than –50 °C or exceeding 50 °C, and precipitation above 100 mm.
- Missing-value handling: records missing two or more meteorological readings are discarded (<0.5% of raw data); single-field gaps are imputed with the mean of the five nearest spatiotemporal neighbors.
No spatial interpolation, up/down-scaling, or rasterization is performed: each observation remains strictly point-based and event-centric, maintaining meteorological fidelity at the incident location and time.
4. Access, Structure, and Example Usage
FiresRu is openly available from its public GitHub repository (https://github.com/sparcus-technologies/FiresRu), with the following structure:
1 2 3 4 5 |
/
README.md
FiresRu.csv
code/
load_and_plot.py |
The primary file, FiresRu.csv, can be obtained via git clone or direct download. v1.0 is released exclusively in CSV format; future versions may provide NetCDF or GeoTIFF gridded exports.
Supported Python-based workflows are illustrated via example scripts. A typical workflow involves loading the CSV with pandas, mapping fire event distributions using GeoPandas, and generating time-series counts:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
import pandas as pd import geopandas as gpd import matplotlib.pyplot as plt df = pd.read_csv("FiresRu.csv", parse_dates=["dt"]) print(df.info()) print(df.describe()) gdf = gpd.GeoDataFrame( df, geometry=gpd.points_from_xy(df.lon, df.lat), crs="EPSG:4326" ) fig, ax = plt.subplots(figsize=(8,6)) gdf.plot(column="type_name", categorical=True, legend=True, alpha=0.6, ax=ax) ax.set_title("FiresRu Events by Type") plt.show() daily = df.set_index("dt").resample("D").size() daily.plot(title="Daily Fire Events (13 months)") plt.ylabel("Count") plt.show() |
5. Applications and Modeling Suitability
FiresRu is optimized for fire-risk modeling and classification tasks—specifically, predicting fire type from meteorological context as demonstrated in its source publication. The dataset is a suitable foundation for:
- Climatological analysis of fire–weather coupling, including seasonal prevalence and regime shifts.
- Spatial–temporal machine learning, such as hybrid neural and graph-based models, encompassing forecasting and severity assessment.
- Feature-importance quantification and early warning prototyping for hazard response within Russian administrative regions.
A plausible implication is that the inclusion of consistently structured meteorological variables for each fire event lends itself to both supervised (categorical prediction) and unsupervised (e.g., PCA, clustering) approaches, supporting multivariate statistical frameworks.
6. Strengths and Limitations
FiresRu’s strengths include broad geographic and event-type representation across the Russian Federation, event-level meteorological annotation, and clear data quality protocols (outlier handling, consistent schema). The CSV flat file format and open distribution scheme simplify integration in Python-centric data pipelines.
However, several caveats apply:
- Temporal scope is confined to 13 months; there are no multi-year trend analyses or interannual variability assessments possible.
- Categorical imbalance is present: some classes (e.g., uncontrolled burns) are rare, leading to modeling challenges in recognition.
- There is no information regarding fuel/vegetation indices (such as NDVI), terrain, or soil/substrate characteristics.
- No proprietary satellite-derived fire metrics or composite hazard indices are included.
- Meteorological fields are 24-hour aggregates or point-in-time values, limiting fine-scale process inference.
- Meteorological source density varies strongly by region (notably sparser in Siberia versus European Russia), which may induce regional data quality heterogeneity.
These limitations constrain certain high-resolution remote sensing analyses and temporal generalizability, although the dataset’s design remains robust for event classification, spatial and seasonal analysis, and initial algorithm development for Eurasian wildfire science (Kriuk, 24 Feb 2025).