Shenzhen UrbanEV Dataset

Updated 8 February 2026

Shenzhen UrbanEV dataset is a high-resolution, city-wide spatio-temporal panel capturing EV charging demand, infrastructure capacity, and grid stress data.
It aggregates 1,194,600 zone-hour observations across 275 zones, providing precise details on energy usage, occupancy, and charging session durations.
The dataset supports advanced forecasting, resilience planning, and physics-informed machine learning approaches for urban EV-grid dynamics.

The Shenzhen UrbanEV dataset is a city-wide, high-resolution spatio-temporal panel curated to support research on electric-vehicle (EV) charging demand, infrastructure performance, and associated grid stress in urban environments. Compiled by Li et al. (2025), it encompasses detailed zone-hour records from September 1, 2022, to February 28, 2023, across the Shenzhen metropolitan area. The dataset is designed to address the need for granular, context-rich data enabling the study of micro-level charging processes as well as city-scale planning and resilience assessment (Wang, 1 Feb 2026).

1. Dataset Composition and Structure

The Shenzhen UrbanEV dataset consists of $N = 1,194,600$ zone-hour observations, derived from aggregation over $Z = 275$ spatial zones and $T = 4,344$ contiguous hourly intervals. Each record captures both dynamic and static attributes relevant to EV charging phenomena:

Temporal Indexing: $t \in \{1, \dots, T\}$ (hourly granularity)
Spatial Indexing: $z \in \{1, \dots, Z\}$ ; each zone covers a city subarea (~1–2 km per side)
Zone Centroid Coordinates: $(x_z, y_z)$ in a projected coordinate system

The core fields in each record are as follows:

Field	Description	Units/Type
$V_{t,z}$	Total charging energy requested	kWh
$occ_{t,z}$	Occupancy statistic (fraction/count of plugs used)	% or count
$dur_{t,z}$	Mean charging session duration	minutes
$T_{t,z}$	Ambient temperature at nearest station	°C
$C_z$	Installed charging capacity	kW (time-invariant)
$n_z$	Number of chargers (plugs) in zone	Integer (time-invariant)

Additionally, a pressure proxy $s_{raw,t,z} = V_{t,z} / (C_z \cdot \Delta t)$ is derived as a normalized indicator of hourly demand relative to installed capacity.

Beyond tabular data, the dataset provides a spatial adjacency graph $G = (V, E)$ , where nodes represent zones and edges connect zone-pairs with Euclidean centroid distance ≤ 5 km. This graph structure supports the deployment of spatio-temporal learning algorithms.

2. Data Sources, Aggregation, and Temporal Coverage

Data aggregation is performed from raw logs generated at hundreds of public and semi-public EV charging stations throughout Shenzhen. Each station-level event is mapped to the appropriate spatial zone and hourly interval for aggregation, with the following characteristics:

Period: September 2022 – February 2023 (six months)
Observations: $N = 1,194,600$ (275 zones $\times$ 4,344 hours)
Underlying events: Aggregated from varying station counts across zones and time, with zone-wise plug count $n_z$ capturing spatial heterogeneity and infrastructure growth dynamics.

Each observation timestamp uses China Standard Time (UTC+8), and the dataset excludes zones with zero installed chargers from forecasting analyses.

3. Features, Preprocessing, and Cleaning

Each zone-hour record furnishes numerical and categorical variables for modeling charging load, infrastructure utilization, ambient conditions, and spatial topology. The main features and any derived attributes include:

Charging Demand: $V_{t,z}$ (hourly energy requested), $occ_{t,z}$ (utilization proportion/count), $dur_{t,z}$ (average duration)
Weather Context: $T_{t,z}$ (nearest-station temperature)
Infrastructure Capacity: $C_z$ (kW total), $n_z$ (charger count)
Spatial Metadata: $(x_z, y_z)$ (centroid), administrative metadata
Pressure Proxy: $s_{raw,t,z}$ , later utilized in downstream resilience analytics

Data preprocessing comprises a chronological split into train/validation/test sets (70%/15%/15% by contiguous time), standardization to zero mean and unit variance for dynamic features (fit only on the training subset), and imputation of missing temperature values by forward-filling from co-located meteorological data. No explicit removal of outliers in $V_{t,z}$ or $occ_{t,z}$ is performed; extreme $s_{raw}$ values are handled by downstream clipping at $s_{max} = 3.0$ during the physics-informed modeling stage.

Static attributes $C_z$ and $n_z$ are checked for temporal consistency and treated as time-invariant after this verification.

4. Descriptive Analytics and Statistical Properties

While the dataset paper does not furnish exhaustive summary statistics, several key distributional facts are reported or implied:

Mean hourly demand per zone: $\mathbb{E}[V_{t,z}]$ is on the order of 100–200 kWh, exhibiting a heavy-tailed distribution across both time and space.
Overload prevalence: Fewer than 0.1% of zone-hour records have $s_{raw} > 1$ , indicating that in most hours, instantaneous demand rarely exceeds installed charging capacity.
Temperature variation: $T_{t,z}$ spans approximately 10°C to 30°C, with a cross-sample standard deviation of about 5°C.
Charger density: $n_z$ ranges from single digits (micro-hubs) to many tens (large hubs), with $C_z$ varying proportionately from several kW to several hundred kW.

A plausible implication is that the dataset presents low intrinsic overload rates; however, stress regimes may nonetheless arise in outlier intervals or as modeled in scenario-based forecasting.

5. Spatial Graph, Data Format, and Access

The spatial topology is encoded as an undirected graph $G = (V, E)$ over the set of 275 zone nodes, with edges based on centroid proximity (≤ 5 km). This facilitates neighborhood-based propagation in graph neural networks and related approaches for both forecasting and resilience simulation.

Data and ancillary files are delivered in the following formats:

Main Panel: Tabular (CSV), one row per zone-hour record
Ancillary: “zones.csv” for static zone metadata; “graph.npz” or “adjacency.csv” for the adjacency matrix
Repository: https://doi.org/10.1038/s41597-025-04874-4 (Scientific Data, Li et al. 2025), with code examples and ingestion scripts to reproduce and manipulate the panel, to be supplemented with a GitHub link upon publication (Wang, 1 Feb 2026).

6. Benchmarking and Research Applications

The Shenzhen UrbanEV dataset serves a dual role as a modeling benchmark and an analytical foundation for studies of EV-grid coupling, urban resilience, and demand forecasting. Its integration of high-resolution temporal granularity, spatial adjacency, infrastructure statistics, and meteorological context enables the deployment and assessment of advanced scientific machine learning frameworks.

Its utility has been demonstrated in the context of a five-stage scientific machine learning pipeline, including dual-head spatio-temporal graph neural networks, backlog dynamics simulation, and coupling to transformer loading analysis. Validation results show that physics-informed machine learning approaches restore monotone stress-to-risk responses (Spearman correlation coefficient +1.0 versus -0.8 for purely data-driven baselines) and yield robust improvements in forecasting accuracy. Policy interventions designed on this dataset can achieve substantial recovery efficiencies, such as a 79.1% reduction of backlog and restoration of full service within the study horizon under simulated demand shocks, with grid stress limited to only 2 additional hours (Wang, 1 Feb 2026).

A derived resilience threshold $m_{crit}(\epsilon) \approx 1.7 - 1.0 \epsilon$ links latent demand flexibility to maximum grid-absorbable stress, supporting risk-aware emergency planning under extreme events.

7. Limitations and Interpretive Considerations

The dataset provides an hourly-aggregated, zone-level view and does not include minute-resolution charging event records, vehicle-level trajectories, or real grid measurements beyond transformer capacity proxies. Purely data-driven models trained on this dataset may manifest non-physical behavior, especially under modeled high-stress conditions, if not augmented by physics-informed constraints. Additionally, plug counts, capacities, and ambient temperature are treated as static or quasi-static within the study period, which may mask temporal infrastructure changes outside the initial validation checks. All observed insights and conclusions pertain to the covered period (September 2022–February 2023) and spatial partitioning, and extrapolation beyond Shenzhen or this interval requires independent assessment (Wang, 1 Feb 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Scientific Machine Learning for Resilient EV-Grid Planning and Decision Support Under Extreme Events (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Shenzhen UrbanEV Dataset.