Urban Street Tree Datasets: Methods & Impact

Updated 12 December 2025

Urban street tree datasets are organized, multi-modal collections designed for detecting, classifying, and assessing tree attributes in urban environments.
They integrate diverse protocols from mobile imaging to satellite and LiDAR to enable precise measurements and indexing of urban forest features.
These datasets drive advances in computer vision, urban ecology, and policy, supporting automated inventories and environmental impact studies.

Urban street tree datasets are structured corpora designed to support quantitative, algorithmic, and planning work on the detection, classification, and assessment of street trees. Such datasets vary widely in spatial scale, modality, geographic diversity, annotation detail, and licensing. All major publicly documented corpora relate to advances in computer vision, urban ecology, geoinformatics, and environmental policy. This article surveys the major dataset designs, collection and annotation schemas, evaluation protocols, modeling workflows, and their implications for both research and urban asset management, referencing primary studies and dataset releases.

1. Data Acquisition Protocols and Modalities

Urban street tree datasets leverage a variety of imaging and sensing approaches:

Mobile imaging (egocentric): Motorcycle- or bicycle-mounted action cameras (e.g., GoPro Hero), as in the Urban Street Tree Dataset collected in Hyderabad and Delhi (Bahety et al., 2022). The data comprise 1920×1080 MP4 video (30 fps), with stills extracted at 1 fps for annotation.
Street-level panoramas: Google Street View acquisitions provide cylindrical panoramas with spatial metadata, as in Treepedia and other large-scale AI-driven systems (Cai et al., 2018, Branson et al., 2019, Laumer et al., 2020, Abuhani et al., 19 Aug 2025).
Aerial/satellite imagery: High-resolution RGB or multispectral imagery (10–60 cm GSD), sourced from NAIP, OpenAerialMap, or Google Maps, supports direct crown detection as in (Waters et al., 2021, Ventura et al., 2022, Veitch-Michaelis et al., 16 Jul 2024).
Mobile Mapping System (MMS) LiDAR + panoramic imaging: Synchronized 3D point clouds and high-res 360° imagery as in WHU-STree enable precise trunk/crown geometry, species discrimination, and cross-modal learning (Ding et al., 16 Sep 2025).
Smartphone RGB stills: Paired perspective images are acquired for direct morphological measurement and trunk segmentation, as in the Dubai paper (Khan et al., 2 Jan 2024).
Multi-modal/3D simulation-ready: Diffusion-guided 3D mesh construction from Street View crops, genus labels, and procedural growth priors, yielding detailed assets for simulation (Lee et al., 14 Jul 2024).

Each collection paradigm imposes specific requirements on annotation, downstream modeling, and interpretability.

2. Annotation Procedures and Label Schemes

Protocols have evolved in response to occlusion, species diversity, and intended downstream use:

Instance vs. semantic labeling: Some datasets provide only presence/absence or coarse vegetation masks (e.g., pixelwise vegetation in GSV images (Cai et al., 2018)), while others enforce instance-level bounding boxes (e.g., trunk-only in (Bahety et al., 2022)), polygons for individual crowns (e.g., MS-COCO style in OAM-TCD (Veitch-Michaelis et al., 16 Jul 2024)), or point locations for trunks/canopies (NAIP (Ventura et al., 2022)).
Object of annotation: Datasets differ in annotating trunk only (e.g., to minimize intra-class occlusion and canopy ambiguity (Bahety et al., 2022)), full crown (relevant for green cover and segmentation (Veitch-Michaelis et al., 16 Jul 2024)), or both trunk and crown in 3D (for morphological parameter estimation (Ding et al., 16 Sep 2025)).
Species-level labeling: Where authoritative inventory data are available (e.g., NYC 2015 census, Pasadena, London Camden), species/genus tags are assigned directly or probabilistically (Lai et al., 2017, Waters et al., 2021, Branson et al., 2019). Elsewhere, clusters or embeddings serve as unsupervised surrogates (Abuhani et al., 19 Aug 2025).
Morphological and auxiliary parameters: Height, DBH (Diameter at Breast Height), crown spread, and health status are recorded when fieldwork or aligned measurement is possible (Lai et al., 2017, Khan et al., 2 Jan 2024, Ding et al., 16 Sep 2025).
Protocols for occlusion and partial observation: Partial trunks are labeled in proportion to visibility, with explicit thresholds for omission or inclusion of non-tree occluders (Bahety et al., 2022).

Annotation tools include CVAT for 2D imagery, RoboFlow for segmentation masks, and custom point cloud browsers for 3D/MMS data. Consistency is enforced via professional review, spot-checking, and, periodically, inter-annotator agreement statistics.

3. Dataset Composition, Statistics, and Structure

Urban street tree datasets vary in their scale, coverage, and schema:

Dataset	#Trees / Instances	Modality	Attributes/Labels	Cities / Regions
Urban Street Tree (IN)	2,265 (trn), 302 (tst)	Motorcycle video	2D trunk boxes	Hyderabad, Delhi
NYC Tree Census 2015	652,169	Field survey	Species, DBH, health, GPS	NYC (5 boroughs)
NAIP Multispectral Urban	95,972	Aerial (4b+NDVI)	Trunk/canopy points	8 CA cities
WHU-STree	21,007	LiDAR + 360° image	3D trunk/crown, species, DBH	Nanjing, Shenyang
Treepedia	500	GSV panorama	Per-pixel vertical veg mask	5 world cities
Google Maps Registree	>80,000	Aerial/SV	2D loc, species, time flag	Pasadena CA
OAM-TCD	280,000+	Aerial (10cm RGB)	2D polygon, canopy group	Global, 206 cells
Mobile Phone DBH (Dubai)	400	Smartphone	Trunk seg. mask, DBH	Dubai
Tree-D Fusion	600,000	GSV+diffusion	3D mesh, genus, metadata	23 N.A. cities
Street-level Embeddings	1.77M	GSV + embeddings	Visual/spatial, cluster	8 N.A. cities

File structures adhere to standard image formats (JPEG, PNG, GeoTIFF), point cloud standards (.las), and hierarchical foldering. Metadata CSVs and JSON objects encode per-instance and per-segment attributes.

4. Evaluation Metrics and Validation Protocols

Detection, counting, segmentation, and classification performance are quantitatively assessed using widely adopted metrics, several of which are specialized for these datasets:

Mean Average Precision (mAP): For detection at IoU ≥ 0.5, standard under COCO and VOC evaluation (Bahety et al., 2022, Veitch-Michaelis et al., 16 Jul 2024). For single-class datasets, mAP reduces to AP.
Mean Absolute Error (MAE): Used for both instance counts (per segment or route) (Bahety et al., 2022) and regression tasks (e.g., GVI estimation, DBH estimation) (Cai et al., 2018, Khan et al., 2 Jan 2024).
Tree Count Density Classification Accuracy (TCDCA): Fraction of street segments correctly binned by tree density class, supporting density-based mapping (Bahety et al., 2022).
Intersection over Union (IoU): Primary metric for per-pixel segmentation and instance segmentation tasks (Cai et al., 2018, Veitch-Michaelis et al., 16 Jul 2024, Ding et al., 16 Sep 2025).
F1-Score, Precision, Recall: For instance segmentation and detection (Ding et al., 16 Sep 2025, Veitch-Michaelis et al., 16 Jul 2024).
Species classification OA, mIoU: Overall Accuracy (OA) and mean class IoU for species recognition tasks (Ding et al., 16 Sep 2025).
Biodiversity metrics: Shannon and Simpson diversity indices computed from cluster or species labels in a spatially explicit manner (Abuhani et al., 19 Aug 2025).
Localization error (RMSE): For trunk/crown geo-location (Ventura et al., 2022).

Reference models report, for example, mAP = 83.74% for detection on Indian road scenes (Bahety et al., 2022), IoU = 0.876 for SegFormer on OAM-TCD (Veitch-Michaelis et al., 16 Jul 2024), and species classification OA = 88% for PTv2 on WHU-STree (Ding et al., 16 Sep 2025).

5. Visualization and Density Mapping Techniques

Advanced urban street tree datasets support geospatial visualization for urban planning and environmental assessment:

Category Map: Route-level coloring by discrete tree density classes (e.g., <20, 20–30, …, >50 trees/km—mapped to black, red, blue, green, dark green), facilitating rapid inspection of “tree-starved” vs. “tree-rich” streets (Bahety et al., 2022).
Kernel Density Ranking (KDR): Non-parametric kernel estimators (bandwidth h) over tree point locations, yielding a smoothed density field. The density ranking metric

$\hat\alpha(x) = \frac{1}{n}\sum_{i=1}^n \mathbf{1}\left[\hat p(X_i)\leq\hat p(x)\right]$

maps local density to [0,1] for heatmap visualization.

Biodiversity Maps: Spatial aggregation of unsupervised clusters or genus labels to grid cells, assigning local Shannon/Simpson indices for spatial diversity mapping (Abuhani et al., 19 Aug 2025).
Urban filtering and extraction: Geo-referencing and intersection with street centerlines (buffered at 5–10 m) extract street-facing trees from global/urban canopy datasets (Veitch-Michaelis et al., 16 Jul 2024).

6. Practical and Scientific Impact

Urban street tree datasets have empirically advanced:

Automated asset inventories: By replacing manual fieldwork, these datasets underpin rapid and repeatable urban infrastructure audits (e.g., (Bahety et al., 2022, Ding et al., 16 Sep 2025)).
Ecological modeling and environmental justice: Integration with air-quality, health, and demographic layers supports policy interventions and impact studies (Lai et al., 2017, Abuhani et al., 19 Aug 2025).
Benchmarking and model generalization: The breadth of modalities and geography in WHU-STree, OAM-TCD, and Tree-D Fusion enables robust testing of domain adaptation, multi-modal fusion, and open-vocabulary recognition (Ding et al., 16 Sep 2025, Lee et al., 14 Jul 2024, Veitch-Michaelis et al., 16 Jul 2024).
Simulation and AR/VR: 3D simulation-ready assets (Tree-D Fusion) provide infrastructural “digital twins” for VFX, urban microclimate analysis, collision-aware planning (Lee et al., 14 Jul 2024).
Monitoring and change detection: CNN-powered pipelines support temporal tracking of planting/removal, invasion by undesirable species, and canopy decline (Branson et al., 2019).
Data integration across vintages: Historic inventories, retrofitted with GPS via image-based geocoding, enable longitudinal paper of urban forest dynamics (Laumer et al., 2020).

7. Limitations, Challenges, and Future Directions

Open issues and technical bottlenecks discussed in primary sources:

Generalizability: Single-city or single-device collections (e.g., Dubai, Indian motorcycle video) limit inference to other imaging conditions, planting regimes, and species (Bahety et al., 2022, Khan et al., 2 Jan 2024).
Cross-domain robustness: Transfer across cities with differing species, LiDAR densities, and canonically regularized spacing remains under-explored (Ding et al., 16 Sep 2025).
Occlusion management: Even with multi-view fusion, heavy occlusion can suppress instance yields, especially in visually complex environments (Bahety et al., 2022, Laumer et al., 2020).
Multi-modal data alignment: Synchronization of LiDAR/imagery or handling GPS error requires sophisticated alignment techniques (Ding et al., 16 Sep 2025).
Labeling bottlenecks and open-data constraints: Not all datasets are publicly licensed or permanently archived (e.g., (Khan et al., 2 Jan 2024)). Variance in annotation protocol/quality further impedes downstream fusion and benchmarking.
Spatial context and topological priors: Current approaches seldom incorporate planting interval, adjacency, or regulatory clustering in model structure (Ding et al., 16 Sep 2025).
Dynamic asset management: The rise of large, multi-modal LLMs is proposed as a next step for end-to-end asset management querying and recommendation (Ding et al., 16 Sep 2025).

A plausible implication is that future datasets will require even richer cross-modal, cross-city coverage—including health, risk, and maintenance endpoints—and algorithmic commons for both modeling and asset management.

The cited corpus forms the foundation for method development and urban forestry research, supporting detection, quantification, planning, and simulation with rigorously annotated, geo-enabled samples at multiple scales and modalities (Bahety et al., 2022, Lai et al., 2017, Ventura et al., 2022, Cai et al., 2018, Khan et al., 2 Jan 2024, Lee et al., 14 Jul 2024, Ding et al., 16 Sep 2025, Branson et al., 2019, Waters et al., 2021, Abuhani et al., 19 Aug 2025, Veitch-Michaelis et al., 16 Jul 2024, Laumer et al., 2020).