AMERI-FAR25 Dataset for Carbon Flux Mapping
- AMERI-FAR25 is a dataset combining flux tower measurements with Landsat imagery to provide high-resolution carbon flux data across North American ecosystems.
- It comprises 7.7 million half-hour records from 209 AmeriFlux sites spanning 2013–2023, enabling detailed spatial and temporal analyses.
- Its use in the FAR deep-learning framework demonstrates an effective method for upscaling point measurements to pixel-level ecosystem carbon mapping.
The AMERI-FAR25 dataset is a publicly available resource constructed for high-resolution carbon flux prediction across diverse North American ecosystems. It pairs eddy-covariance tower flux measurements with co-located Landsat 8 and 9 imagery, forming the basis of the Footprint-Aware Regression (FAR) deep-learning framework for pixel-level ecosystem carbon flux estimation at 30 m spatial resolution (Searcy et al., 1 Dec 2025).
1. Dataset Composition and Scope
AMERI-FAR25 comprises data from 209 distinct AmeriFlux sites contributing a total of 439 site-years between 2013 and 2023. The geographical range includes the United States, Canada, Mexico, and Peru. The dataset contains 7,697,145 half-hourly records of net ecosystem carbon flux (FC), spanning a variety of ecosystem types as defined by IGBP codes:
- DBF (deciduous broadleaf forest)
- ENF (evergreen needleleaf forest)
- EBF (evergreen broadleaf forest)
- WET (wetlands)
- GRA (grasslands)
- CSH/OSH (shrublands)
- CRO (croplands)
- SAV (savannas)
- MF (mixed forests)
- CVM/BSV and additional minor classes
Each sample is spatially represented as a 128×128 pixel Landsat scene (≈3.84 km × 3.84 km), centered on the tower location.
| Dimension | Value | Ecosystem Coverage |
|---|---|---|
| Sites | 209 | DBF, ENF, WET, GRA, CRO, etc. |
| Site-years | 439 | 2013–2023 (varies by site) |
| Flux records | 7,697,145 | Half-hour, spatially co-located |
2. Data Types and Variables
The dataset integrates flux, meteorological, and remote-sensing inputs:
Tower-Measured Variables:
- Net ecosystem exchange (FC) at half-hour intervals (Mg C ha⁻¹ half-hour⁻¹)
Meteorological/Environmental Drivers:
- For footprint modeling (
X_footprint): wind direction (WD), wind speed (WS), friction velocity (USTAR), air temperature (TA), sensible heat flux (H), tower height. - For flux prediction (
X_drivers): shortwave incoming radiation (SW_IN), air temperature (TA), relative humidity (RH) - PRISM normals (800 m grid): daily TA, RH; monthly solar transmission; SW_IN estimated via pysolar
Satellite Inputs:
- Landsat 8 & 9 bands resampled to 30 m: coastal aerosol, blue, green, red, NIR, SWIR1, SWIR2, cirrus, TIRS1, TIRS2—excluding the panchromatic band.
- Ancillary: sun/sensor azimuth and zenith
- Cloud-filtered scenes: 45,124 valid patches
3. Spatial and Temporal Resolution
AMERI-FAR25 delivers high spatial precision:
- Pixel size: 30 m × 30 m (thermal bands resampled from native 100 m)
- Patch size: 128 × 128 pixels (~4 km side length)
- Tower sampling: every 30 minutes
- Landsat revisit: nominally every 16 days; "most recent available" scene per flux record
- Aggregation for model evaluation: monthly and annual sums of flux
Footprint modeling leverages soft attention-based masks , normalized such that . Pixel-level predictions for are aggregated using this mask:
4. Preprocessing and Quality Control
Tower Data Cleaning:
- Inclusion: AmeriFlux BASE, CC-BY-4.0 license, required meteorological and flux variables
- SW_IN derived from PPFD_IN via linear scaling where necessary
- Metadata harmonized (highest sensor, tower height required)
- Outlier removal: FC outside 0.5–99.5% percentiles; negative SW_IN; night drawdown exclusion (FC<0 with SW_IN=0)
Satellite Imagery Processing:
- Download via landsatxplore with custom reliability fixes
- QA/QC using PIXEL_QA flags (clear land/water: codes 21824, 21888, 21952)
- Missing/cloudy pixels filled via temporal back-fill from latest valid observation
Co-Registration:
- Patches centered on tower coordinates; resampled bands; orientation angles concatenated as four supplementary channels
5. Data Splitting and Modeling Protocols
Spatial Splitting:
- Sites grouped by IGBP ecosystem class
- For classes with ≥10 sites: 40% withheld (20% validation, 20% test)
- Classes with <10 sites: used exclusively for training
Temporal Splitting:
- val_future/test_future: final year from multi-year sites, targeted at temporal-drift studies
- val: 20% random holdout of remaining half-hour records
| Split | Tower Records | Landsat Patches |
|---|---|---|
| Training (train) | ~5.6 million | ~35,000 |
| Validation/Test (site) | ~2.1 million | ~10,000 |
| val/test (future year) | ~0.5 million | ~5,000 |
A plausible implication is that these splits facilitate ecosystem- and temporally robust generalization studies for upscaling models.
6. Metadata and Annotations
AMERI-FAR25 includes extensive metadata relevant for both scientific reproducibility and ecological interpretation:
- Land cover and ecosystem codes (IGBP type per site)
- Tower metadata: height, coordinates, operation years, documented disturbance events (e.g., clear-cut, fire scar)
- Sun/sensor azimuth and zenith per scene (enabling bidirectional reflectance modeling)
Data are made available on Zenodo (DOI pending) with FAR code, model weights, and full site list published at github.com/jsearcy1/FAR and a corresponding archive.
7. Context, Intended Use, and Implications
AMERI-FAR25 underpins the Footprint-Aware Regression (FAR) framework, which achieves an for monthly net ecosystem exchange prediction on holdout test sites. The dataset allows pixel-level upscaling of ground-validated fluxes, addressing the mismatch between tower and satellite spatial scales. Researchers may apply AMERI-FAR25 directly for landscape-scale, high-resolution carbon flux mapping in heterogeneous environments, facilitating cross-ecosystem analyses and method development for natural climate solutions (Searcy et al., 1 Dec 2025).