Papers
Topics
Authors
Recent
2000 character limit reached

TorchGeo: Geospatial Deep Learning

Updated 2 November 2025
  • TorchGeo is an open-source geospatial deep learning library that integrates raster and vector data for remote sensing and analysis.
  • It offers modular data loaders, geospatial samplers, and transforms that enable efficient preprocessing, reprojection, and resampling.
  • TorchGeo facilitates transfer learning with pre-trained multispectral models and standardized benchmarks for reproducible research.

TorchGeo is an open-source Python library designed to integrate geospatial data, including remotely sensed imagery, into the PyTorch deep learning ecosystem. TorchGeo addresses the unique complexities of geospatial analytics: heterogeneous spectral bands, coordinate reference systems (CRS), spatial resolutions, and geospatial metadata alignment. It provides modular data loaders, composable datasets, geospatially-aware samplers, transforms for multispectral and arbitrary-band imagery, and pre-trained models for transfer learning. The library is used in both foundational research and large-scale benchmarking, facilitating reproducible research and efficient deployment of deep learning models within remote sensing and geospatial analysis.

1. Library Architecture and Data Abstractions

TorchGeo provides two core dataset abstractions: GeoDataset (for uncurated raster/vector sources) and NonGeoDataset (for curated benchmarks). Internally, datasets index samples using geopandas GeoDataFrames, supporting spatial and temporal queries and enabling composition (intersection or union) of input data modalities. Automatic reprojection and resampling leverage rasterio and fiona, ensuring native support for arbitrary CRS and pixel alignments. Supported raster formats include GeoTIFF (with Cloud Optimized GeoTIFF), PNG, JPEG, Zarr, HDF5, NetCDF; vector formats encompass Shapefile, GeoJSON, and GeoPackage. Benchmark datasets (>125 as of 2025) span land cover, crop mapping, biomass estimation, and disaster response.

2. Data Loading, Sampling, and Preprocessing

TorchGeo implements configurable data loaders for both benchmark and generic datasets. For efficient spatial coverage of scenes, it offers multiple geospatial samplers:

  • RandomGeoSampler: Uniform sampling of bounding boxes within spatial extent.
  • RandomBatchGeoSampler: Scene-centric batching (cache-efficient).
  • GridGeoSampler: Exhaustive spatial sampling (optimally suited for inference).

On-the-fly preprocessing performs reprojection, resampling, and pixel alignment, removing the need for prior alignment in GIS software. The sampling strategy and preprocessing pipeline are tightly integrated with PyTorch DataLoader semantics, enabling batched mini-batch training on GPUs. For optimal I/O throughput on petabyte-scale data, recent research emphasizes block- or tile-aligned reads and aggressive DataLoader worker/prefetch customization (Zaytar et al., 6 Jun 2025).

3. Multispectral Data Handling and Transformations

TorchGeo supports arbitrary-band imagery, overcoming the conventional 3-channel (RGB) constraint in computer vision frameworks. Transform pipelines, including spatial/spectral augmentation, are implemented using Kornia (channel-agnostic) and maintain pixel-wise alignment between images and labels. For input scenes XRH×W×CX \in \mathbb{R}^{H \times W \times C}, all aligned layers satisfy Xi,j1X^1_{i, j} and Xi,j2X^2_{i, j} mapping to the same earth location for all i,ji, j.

4. Foundation Models, Transfer Learning, and Benchmarks

TorchGeo is the first library to provide pre-trained weights for multispectral satellite imagery, including all bands from satellites such as Sentinel-2 and Landsat-8. Weights for extra bands not present in ImageNet are randomly initialized, while RGB bands leverage transferred knowledge. This enables transfer learning for remote sensing where labeled data is scarce, yielding strong generalization, particularly for cross-region scenarios (e.g., So2Sat, Chesapeake Land Cover). Experiments demonstrate that transfer learning from multispectral models significantly outperforms random initialization, especially in out-of-domain evaluations (e.g., So2Sat: 63.99% MSI accuracy with ImageNet(+random) initialization) (Stewart et al., 2021).

Benchmarks are standardized on 8+ datasets, covering classification, segmentation, regression, instance segmentation, and change detection. Results are reproducible—experiment configs, hyperparameter grids, 10-seed repetitions, and torchmetrics reporting are provided for fair comparison.

Dataset Model Initialization Bands Metric
RESISC45 ResNet50 ImageNet RGB 95.42% Acc
So2Sat ResNet50 ImageNet(+R) MSI 63.99% Acc
LandCover.ai U-Net/Res50 ImageNet RGB 84.81% mIoU
Chesapeake U-Net/Res50 ImageNet(+R) MSI 69.4% mIoU

(+R: Random weight initialization for non-RGB bands)

5. Integration with ML Ecosystem and Reproducibility

TorchGeo adheres to PyTorch Dataset/DataLoader paradigms and integrates seamlessly with PyTorch Lightning, powering experiments with high reproducibility. Dataset composition operators (intersection, union) support geospatial data fusion, multimodal learning, and label sampling only where spatial/temporal overlap occurs. All modules are plug-and-play; training and evaluation can be orchestrated on arbitrary geospatial datasets via LightningCLI for configuration-based experimentation.

Self-supervised learning pipelines and semantic segmentation tasks, such as those realized in SSL4EO-L (5M+ Landsat patches), are enabled by this tight integration. TorchGeo hosts both data and pre-trained model weights/public cards via Hugging Face, streamlining download, caching, and version management (Stewart et al., 2023).

6. Applications, Case Studies, and Ecosystem Position

TorchGeo is employed in precision agriculture, disaster monitoring, urban planning, and foundational Earth observation research. Applications range from crop type mapping (EuroCrops/Sentinel-2) with on-the-fly vector rasterization and chipping, to cloud segmentation (L7 Irish, L8 Biome), to fusion of multimodal satellite and vector data for land cover and change detection (Stewart et al., 2 Oct 2025).

Application Area Example Workflow TorchGeo Feature
Crop Type Mapping Sentinel2 & EuroCrops composition On-the-fly rasterization
Semantic Segmentation U-Net/ResNet50 pretraining Multispectral weights
Large-Scale Inference GridGeoSampler, caching Geospatial samplers
Foundation Model Training SSL4EO-L, self-supervised Datamodule, LightningCLI

7. Open Source Ethos, Best Practices, and Future Directions

TorchGeo is distributed under a permissive MIT license, enabling broad adoption and ecosystem synergy. The library features 100% test coverage, open governance (Technical Steering Committee), and rigorous documentation. Best practices include use of Cloud Optimized GeoTIFF, publicly hosted data/models, clear licensing, and reproducibility (seed control, configuration files).

Open issues include reproducibility under stochasticity, scalability (I/O bottlenecks for massive datasets), and extension to foundation models and embeddings as new sensor modalities and tasks arise. The future includes growth of low-code GUIs, GPU-accelerated geospatial ops, and deeper integration with benchmarking/optimization toolkits such as TerraTorch (Gomes et al., 26 Mar 2025).

Summary Table

Component Description
Data Loaders Supports benchmarks and fusion of raster/vector geospatial datasets
Samplers Grid, random, batch; spatial/temporal bounding-box-based
Transforms Arbitrary-band, spatial/spectral, preserving pixel alignment
Pre-trained Models Multispectral weights; transfer learning; first-of-kind in geospatial domain
PyTorch Integration Native Dataset/DataLoader, Lightning trainers, CLI/YAML pipelines
Preprocessing On-the-fly reprojection/resampling/alignment; platform-agnostic
Reproducibility Deterministic configs, torchmetrics, public code/results
License, Governance MIT, open contributor model, TSC

TorchGeo constitutes a cornerstone in the geospatial machine learning landscape, enabling rigorous, scalable, and reproducible deep learning on heterogeneous earth observation data.

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to TorchGeo.