Papers
Topics
Authors
Recent
2000 character limit reached

GlobalUrbanNet Dataset Overview

Updated 28 November 2025
  • GlobalUrbanNet is a comprehensive, open-access repository of street network graphs and urban morphology indicators for 8,914 cities worldwide.
  • The dataset employs a robust OSMnx workflow to extract and simplify drivable street networks, computing metrics such as circuity, intersection density, and grade.
  • It supports large-scale, reproducible research in urban planning, transport, and sustainability through multiple export formats and standardized geodata.

The GlobalUrbanNet dataset is a comprehensive, openly accessible resource of street network graphs and form indicators for urban areas worldwide. It covers 8,914 cities in 178 countries, capturing the topology, geometry, and infrastructure characteristics of drivable street networks as modeled from OpenStreetMap and defined by the Global Human Settlement Layer’s Urban Centre Database (UCD). The dataset enables large-scale, reproducible urban morphology analysis, transport planning studies, and sustainability research through standardized, ready-to-use geodata and computed indicators at the urban-area scale (Boeing, 2020).

1. Dataset Scope and Urban Boundary Definition

GlobalUrbanNet models the street network for every urban area classified as a "true positive" in the UCD v2019a (using DEGURBA methodology):

  • Urban centres are contiguous clusters of 1 km² grid cells, each with ≥1,500 inhabitants/km² and aggregate population ≥50,000.
  • Included sites must have at least 1 km² built-up area and at least 3 drivable-street OpenStreetMap (OSM) nodes.

The dataset encompasses:

Statistic Value
Urban areas modeled 8,914
Countries covered 178
Raw OSM nodes/edges ≈160M nodes/320M edges
Post-simplification (nodes/edges) ≈37M nodes/53M edges

The urban boundaries are strictly defined by v2019a UCD polygons, which serve as the spatial units for topological extraction and all subsequent indicator calculation.

2. Data Acquisition, Processing, and Graph Construction

2.1 OSMnx Workflow

Street networks are modeled for each boundary using the OSMnx Python package:

  • For each UCD area, the boundary polygon is slightly buffered to reduce periphery distortion.
  • OSMnx's graph_from_polygon method is invoked with network_type='drive' to create a directed, nonplanar multigraph. Non-drivable ways (service, pedestrian-only) are excluded; living streets, woonerfs, etc., are included.

Example OSMnx code:

1
2
3
import osmnx as ox
G = ox.graph_from_polygon(boundary, network_type='drive', retain_all=True, simplify=False)
Gs = ox.simplify_graph(G)

2.2 Topological Simplification

Raw OSM street representations contain all polygonal vertices as nodes. The workflow simplifies these graphs:

  • Only true intersections and dead-ends are retained as nodes.
  • Edges preserve full geometry (shape) as LineString objects.

This reduces the complexity and increases suitability for graph-theoretic analysis.

2.3 Elevation and Grade Assignment

Each graph node is assigned elevation using two digital elevation models (DEM):

  • ASTER v2 (≈30 m, higher noise)
  • CGIAR-processed SRTM (≈90 m, lower noise)

At each node, DEM readings are compared with the (closed) Google Maps Elevation API as a reference; the DEM value closest to Google’s value is selected. Edge grade is then computed:

gradee=Δhe×100%\mathrm{grade}_e = \frac{|\Delta h|}{\ell_e} \times 100\%

where e\ell_e is edge length and Δh=h(v)h(u)\Delta h = h(v) - h(u) for edge (u,v)(u,v).

2.4 Export Formats

For every urban area, outputs include:

  • GraphML (.graphml) for network analysis
  • GeoPackage (.gpkg) for GIS workflows
  • Node and edge CSVs

3. Data Structure, Attributes, and Repository Organization

3.1 Graph Representation

Networks are stored as NetworkX graphs with per-node and per-edge attributes:

  • Node attributes: xx, yy (coordinates), elevation, osmid, street_count
  • Edge attributes: length, grade, OSM highway tag, geometry, oneway flag, node IDs

3.2 Repositories and File Organization

The data are deposited in multiple open repositories (Harvard Dataverse):

Repository Name DOI Content Type
Indicators 10.7910/DVN/ZTFPTB Urban area indicator tables
Metadata 10.7910/DVN/WMPPF9 Reference metadata
GraphML models 10.7910/DVN/KA5HJ3 GraphML files by country
GeoPackages 10.7910/DVN/E5TPDQ GIS geodatabases by country
Node/Edge CSVs 10.7910/DVN/DC7U0A Node and edge lists (CSV)

4. Street Network Indicators and Methodology

Merged with UCD metadata (population, GDP), the dataset provides a globally comparable suite of graph-theoretic and morphological indicators for each urban area.

4.1 Circuity and Straightness

GlobalUrbanNet quantifies circuity—how much street paths deviate from straight lines:

C=eEeeEdeuclid(ue,ve)C = \frac{\sum_{e\in E} \ell_e}{\sum_{e\in E} d_\mathrm{euclid}(u_e, v_e)}

where e\ell_e is the real-world length of edge ee and deuclidd_\mathrm{euclid} is the Euclidean node-node distance. Straightness is defined as $1/C$.

4.2 Intersection Density and Node Degree

Three main node indicators:

  • intersect_count: nodes of degree >2
  • intersect_count_clean: merges nodes within 10m Euclidean distance
  • intersect_count_clean_topo: merges only along network (avoiding overpasses) within 10m

Additional degree-based proportions: prop_4way (degree=4), prop_3way, prop_deadend.

4.3 Connectivity, Clustering, and Hierarchy

Mean node degree (kavgk_\text{avg}), self-loop proportion, several clustering coefficients (undirected, directed, weighted) are provided:

Cavg,undir=1Ni2tiki(ki1)C_\mathrm{avg,undir} = \frac{1}{N} \sum_i \frac{2 t_i}{k_i (k_i - 1)}

where tit_i is the number of triangles at node ii.

PageRank is computed for hierarchical centrality, with pagerank_max as the maximum value per city.

4.4 Orientation Entropy and Gridness

Adopting indicators from Boeing (2019), orientation entropy (HH) quantifies regularity of street bearings:

H=j=1Bpjlog2pjH = -\sum_{j=1}^B p_j \log_2 p_j

where pjp_j is the share of street segments in each of BB bearing bins. Orientation order is 1H/Hmax1 - H/H_{\text{max}}.

4.5 Elevation and Grade Distribution

Summaries of node elevations and edge grades (mean, median, std, interquartile range, range) are included per urban area and can be directly linked to urban form and potential accessibility constraints.

5. Access, Example Workflows, and Use Cases

Data are accessible via Harvard Dataverse; browsing and scripted downloads are supported.

Sample code for loading GraphML into NetworkX:

1
2
3
4
import networkx as nx
G = nx.read_graphml("beijing-10687.graphml")
print("nodes:", G.number_of_nodes())
print("edges:", G.number_of_edges())

Querying the indicator CSVs with pandas:

1
2
3
4
import pandas as pd
df = pd.read_csv("GlobalUrbanStreetNetworks-Indicators.csv")
top10 = df.sort_values("intersect_count_clean_topo", ascending=False).head(10)
print(top10[["core_city", "country_iso", "intersect_count_clean_topo"]])

Open-source modeling and analysis code is provided at https://github.com/gboeing/street-network-models, with additional tutorials at https://github.com/gboeing/osmnx-examples.

6. Synthesized Findings and Empirical Patterns

Analysis of the assembled indicators produces several global empirical findings:

  • Regional Circuity and Intersection Density: Southern Africa and Melanesia demonstrate the highest circuity (≈8.6%); Northern Africa and South America the lowest (≈3.7%). Absolute intersection densities range from ~284/km² (Northern Africa) to ~39/km² (Eastern Europe).
  • Scaling Laws: There is a strong global linearity between total street length and intersection count (R2=0.91R^2=0.91, ≈178.6 m per intersection). Both street length and node count scale sublinearly with population (P0.90P^{0.90} and P0.95P^{0.95} respectively).
  • GDP vs. Road Provision: Mean per capita street length is 2.13 m; an increment of $10,000 USD$ GDP per capita predicts an increase of 1.13 m (±0.03 m) per person in road length.

Elevation assignments are validated against Google Maps data, and node-by-node errors after optimal DEM selection are typically within 0.32 m. Urban-area average elevations match UCD means to within 0.16 m median error.

7. Limitations and Prospective Enhancements

  • Data Completeness and Accuracy: OSM data completeness varies (notably, China <100% in 2016), impacting precision in some regions. Only drivable networks are included; pedestrian, bicycle, and transit layers are omitted.
  • Spatial Extent: UCD boundaries can exclude peri-urban sprawl lying outside built-up thresholds.
  • Elevation Models: DEM inputs (ASTER, SRTM) have tradeoffs in resolution and noise; DEM selection per node reduces, but does not remove, all errors.

Anticipated extensions include incorporating multimodal networks (bike, pedestrian, transit), longitudinal temporal analysis through historical OSM snapshots, finer-scale DEM/LiDAR integration, accessibility and vulnerability computation, and fusion with building footprints and land-use datasets.


GlobalUrbanNet constitutes the first systematic, globally consistent, and open-source repository of detailed street network graphs and morphology indicators at the world’s urban scale, supporting large-scale comparative research in urban science, transport planning, and sustainability (Boeing, 2020).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to GlobalUrbanNet Dataset.