GlobalUrbanNet Dataset Overview
- GlobalUrbanNet is a comprehensive, open-access repository of street network graphs and urban morphology indicators for 8,914 cities worldwide.
- The dataset employs a robust OSMnx workflow to extract and simplify drivable street networks, computing metrics such as circuity, intersection density, and grade.
- It supports large-scale, reproducible research in urban planning, transport, and sustainability through multiple export formats and standardized geodata.
The GlobalUrbanNet dataset is a comprehensive, openly accessible resource of street network graphs and form indicators for urban areas worldwide. It covers 8,914 cities in 178 countries, capturing the topology, geometry, and infrastructure characteristics of drivable street networks as modeled from OpenStreetMap and defined by the Global Human Settlement Layer’s Urban Centre Database (UCD). The dataset enables large-scale, reproducible urban morphology analysis, transport planning studies, and sustainability research through standardized, ready-to-use geodata and computed indicators at the urban-area scale (Boeing, 2020).
1. Dataset Scope and Urban Boundary Definition
GlobalUrbanNet models the street network for every urban area classified as a "true positive" in the UCD v2019a (using DEGURBA methodology):
- Urban centres are contiguous clusters of 1 km² grid cells, each with ≥1,500 inhabitants/km² and aggregate population ≥50,000.
- Included sites must have at least 1 km² built-up area and at least 3 drivable-street OpenStreetMap (OSM) nodes.
The dataset encompasses:
| Statistic | Value |
|---|---|
| Urban areas modeled | 8,914 |
| Countries covered | 178 |
| Raw OSM nodes/edges | ≈160M nodes/320M edges |
| Post-simplification (nodes/edges) | ≈37M nodes/53M edges |
The urban boundaries are strictly defined by v2019a UCD polygons, which serve as the spatial units for topological extraction and all subsequent indicator calculation.
2. Data Acquisition, Processing, and Graph Construction
2.1 OSMnx Workflow
Street networks are modeled for each boundary using the OSMnx Python package:
- For each UCD area, the boundary polygon is slightly buffered to reduce periphery distortion.
- OSMnx's
graph_from_polygonmethod is invoked withnetwork_type='drive'to create a directed, nonplanar multigraph. Non-drivable ways (service, pedestrian-only) are excluded; living streets, woonerfs, etc., are included.
Example OSMnx code:
1 2 3 |
import osmnx as ox G = ox.graph_from_polygon(boundary, network_type='drive', retain_all=True, simplify=False) Gs = ox.simplify_graph(G) |
2.2 Topological Simplification
Raw OSM street representations contain all polygonal vertices as nodes. The workflow simplifies these graphs:
- Only true intersections and dead-ends are retained as nodes.
- Edges preserve full geometry (shape) as LineString objects.
This reduces the complexity and increases suitability for graph-theoretic analysis.
2.3 Elevation and Grade Assignment
Each graph node is assigned elevation using two digital elevation models (DEM):
- ASTER v2 (≈30 m, higher noise)
- CGIAR-processed SRTM (≈90 m, lower noise)
At each node, DEM readings are compared with the (closed) Google Maps Elevation API as a reference; the DEM value closest to Google’s value is selected. Edge grade is then computed:
where is edge length and for edge .
2.4 Export Formats
For every urban area, outputs include:
- GraphML (.graphml) for network analysis
- GeoPackage (.gpkg) for GIS workflows
- Node and edge CSVs
3. Data Structure, Attributes, and Repository Organization
3.1 Graph Representation
Networks are stored as NetworkX graphs with per-node and per-edge attributes:
- Node attributes: , (coordinates), elevation, osmid, street_count
- Edge attributes: length, grade, OSM highway tag, geometry, oneway flag, node IDs
3.2 Repositories and File Organization
The data are deposited in multiple open repositories (Harvard Dataverse):
| Repository Name | DOI | Content Type |
|---|---|---|
| Indicators | 10.7910/DVN/ZTFPTB | Urban area indicator tables |
| Metadata | 10.7910/DVN/WMPPF9 | Reference metadata |
| GraphML models | 10.7910/DVN/KA5HJ3 | GraphML files by country |
| GeoPackages | 10.7910/DVN/E5TPDQ | GIS geodatabases by country |
| Node/Edge CSVs | 10.7910/DVN/DC7U0A | Node and edge lists (CSV) |
4. Street Network Indicators and Methodology
Merged with UCD metadata (population, GDP), the dataset provides a globally comparable suite of graph-theoretic and morphological indicators for each urban area.
4.1 Circuity and Straightness
GlobalUrbanNet quantifies circuity—how much street paths deviate from straight lines:
where is the real-world length of edge and is the Euclidean node-node distance. Straightness is defined as $1/C$.
4.2 Intersection Density and Node Degree
Three main node indicators:
- intersect_count: nodes of degree >2
- intersect_count_clean: merges nodes within 10m Euclidean distance
- intersect_count_clean_topo: merges only along network (avoiding overpasses) within 10m
Additional degree-based proportions: prop_4way (degree=4), prop_3way, prop_deadend.
4.3 Connectivity, Clustering, and Hierarchy
Mean node degree (), self-loop proportion, several clustering coefficients (undirected, directed, weighted) are provided:
where is the number of triangles at node .
PageRank is computed for hierarchical centrality, with pagerank_max as the maximum value per city.
4.4 Orientation Entropy and Gridness
Adopting indicators from Boeing (2019), orientation entropy () quantifies regularity of street bearings:
where is the share of street segments in each of bearing bins. Orientation order is .
4.5 Elevation and Grade Distribution
Summaries of node elevations and edge grades (mean, median, std, interquartile range, range) are included per urban area and can be directly linked to urban form and potential accessibility constraints.
5. Access, Example Workflows, and Use Cases
Data are accessible via Harvard Dataverse; browsing and scripted downloads are supported.
Sample code for loading GraphML into NetworkX:
1 2 3 4 |
import networkx as nx G = nx.read_graphml("beijing-10687.graphml") print("nodes:", G.number_of_nodes()) print("edges:", G.number_of_edges()) |
Querying the indicator CSVs with pandas:
1 2 3 4 |
import pandas as pd df = pd.read_csv("GlobalUrbanStreetNetworks-Indicators.csv") top10 = df.sort_values("intersect_count_clean_topo", ascending=False).head(10) print(top10[["core_city", "country_iso", "intersect_count_clean_topo"]]) |
Open-source modeling and analysis code is provided at https://github.com/gboeing/street-network-models, with additional tutorials at https://github.com/gboeing/osmnx-examples.
6. Synthesized Findings and Empirical Patterns
Analysis of the assembled indicators produces several global empirical findings:
- Regional Circuity and Intersection Density: Southern Africa and Melanesia demonstrate the highest circuity (≈8.6%); Northern Africa and South America the lowest (≈3.7%). Absolute intersection densities range from ~284/km² (Northern Africa) to ~39/km² (Eastern Europe).
- Scaling Laws: There is a strong global linearity between total street length and intersection count (, ≈178.6 m per intersection). Both street length and node count scale sublinearly with population ( and respectively).
- GDP vs. Road Provision: Mean per capita street length is 2.13 m; an increment of $10,000 USD$ GDP per capita predicts an increase of 1.13 m (±0.03 m) per person in road length.
Elevation assignments are validated against Google Maps data, and node-by-node errors after optimal DEM selection are typically within 0.32 m. Urban-area average elevations match UCD means to within 0.16 m median error.
7. Limitations and Prospective Enhancements
- Data Completeness and Accuracy: OSM data completeness varies (notably, China <100% in 2016), impacting precision in some regions. Only drivable networks are included; pedestrian, bicycle, and transit layers are omitted.
- Spatial Extent: UCD boundaries can exclude peri-urban sprawl lying outside built-up thresholds.
- Elevation Models: DEM inputs (ASTER, SRTM) have tradeoffs in resolution and noise; DEM selection per node reduces, but does not remove, all errors.
Anticipated extensions include incorporating multimodal networks (bike, pedestrian, transit), longitudinal temporal analysis through historical OSM snapshots, finer-scale DEM/LiDAR integration, accessibility and vulnerability computation, and fusion with building footprints and land-use datasets.
GlobalUrbanNet constitutes the first systematic, globally consistent, and open-source repository of detailed street network graphs and morphology indicators at the world’s urban scale, supporting large-scale comparative research in urban science, transport planning, and sustainability (Boeing, 2020).