VisitHGNN: Urban Mobility Graph Learning
- VisitHGNN is a heterogeneous graph neural network designed to predict neighborhood-to-destination visit probabilities using multimodal urban data.
 - It integrates spatial, temporal, functional, and socio-demographic features through relation-specific message passing, capturing complex urban dynamics.
 - Real-world evaluations in Fulton County, GA demonstrate its superior performance over baselines, with metrics like a KL divergence of 0.287 and a top-1 accuracy of 0.853.
 
VisitHGNN is a heterogeneous, relation-specific graph neural network designed to model neighborhood-to-destination visit probabilities in urban settings. It operates over a multimodal graph that integrates spatial, temporal, functional, and socio-demographic inputs, predicting the distribution of visits from census block groups (CBGs) to individual points of interest (POIs). By combining multi-format node features, heterogeneous edge types, and a masked Kullback–Leibler (KL) divergence loss, VisitHGNN achieves high-fidelity estimation of empirical visit patterns—demonstrated on real-world mobility data from Fulton County, Georgia—with superior accuracy and alignment compared to strong baseline models (Pang et al., 3 Oct 2025).
1. Heterogeneous Graph Construction and Relation Encoding
The VisitHGNN graph is constructed with explicit heterogeneity in both node and edge types. The node set comprises:
- Points of Interest (POIs): Urban venues characterized by numerical attributes (e.g., polygon area, raw visit counts), structured JSON-derived features (e.g., weekday opening hours, bucketed dwell times), and text-derived embeddings (e.g., names and categories embedded via BERT).
 - Census Block Groups (CBGs): Geographic regions described by 72 socio-demographic features (e.g., population, education, commute modes) augmented by spatial centroid coordinates.
 
The edge set is defined as follows:
- POI–POI Edges: These encode three complementary urban relationships:
- Geospatial proximity: Each POI is linked to its nearest neighbors using the great-circle (haversine) distance and directional bearing.
 - Temporal similarity: Edges are established between POIs with similar hourly visitation profiles, evaluated via cosine similarity over L1-normalized 168-hour activity vectors.
 - Functional (brand) affinity: Edges connect venues with shared brand identity, empirically co-visited venues, and similar categorical labels to capture demand substitutability.
 
 - CBG–CBG Edges: Represent spatial contiguity among neighboring CBGs.
 - POI–CBG Edges: Two types:
- Belong edge: Links each POI to its administrative CBG.
 - Spatial KNN edge: For each POI, connects to the closest CBGs (candidate origins), annotated by Euclidean centroid distance.
 
 
All edges can carry vector or scalar attributes (e.g., distance) and are processed by relation-specific message-passing layers to preserve their semantics.
2. Node and Edge Feature Encoding, Multimodal Fusion
POI encoders process input via multiple parallel modules:
- Numerical Encoder: Processes polygon areas, visit counts, visitor counts, median dwell times, and spatial distances through a multilayer perceptron (MLP).
 - JSON-derived Feature Encoder: Extracts and aggregates structured features like daily open/close times and normalized dwell-time histograms.
 - Text Encoder: Concatenates names, brands, and categorical labels, then encodes using a pre-trained BERT-based LLM to obtain a dense vector.
 
CBG encoders implement a two-layer GraphSAGE architecture, jointly consuming 72 demographic variables and centroid coordinates, using message passing over CBG–CBG adjacency.
Relation-specific message passing is achieved using GATv2 kernels for each POI–POI edge type, with each relation receiving an independent transformation. Outputs of all relations are merged through a gated residual mechanism, ensuring that non-redundant relational signals propagate into the final embedding. For POI–CBG fusion, GraphSAGE is applied to “belong” edges, and GATv2 is used on spatial KNN edges to facilitate information exchange across heterogeneous node types.
All embeddings are normalized via GraphNorm and projected (via MLPs or linear layers) into a shared latent space of dimension .
3. Candidate Set Inference, Scoring, and Probability Calibration
For each POI , VisitHGNN defines a set of candidate origin CBGs, , as the nearest CBGs in Euclidean space. For each candidate , the POI and CBG embeddings are concatenated to form pairwise feature vectors.
A shared MLP prediction head with dropout and ReLU activations generates logits: These logits are passed through a masked softmax to obtain a proper probability distribution over the candidate set: where masks invalid candidate slots.
4. Training Objective and Loss Function
Supervision is imposed only over legitimate POI–CBG candidate pairs for which visit data exists. The ground-truth distribution for POI (fractional share of observed visits from CBG ) is estimated from aggregated OD (origin-destination) flows. The model is optimized via masked Kullback–Leibler (KL) divergence: where (e.g., ) provides numerical stability. This objective ensures full normalization within the candidate set and enforces distributional calibration rather than only pointwise or ranking accuracy.
5. Model Performance and Evaluation Metrics
VisitHGNN demonstrates high predictive fidelity on large-scale weekly mobility data from Fulton County, GA. Performance metrics include:
- Mean KL divergence: $0.287$
 - Mean Absolute Error (MAE): $0.008$
 - Top-1 accuracy: $0.853$ (probability mass on the most likely predicted origin equals that of observed)
 - Coefficient of determination (): $0.892$
 - NDCG@50 (Normalized Discounted Cumulative Gain): $0.966$
 - Recall@5: $0.611$
 
All metrics are averaged over POIs with test observations. These scores substantially surpass those of pairwise MLP and distance-only baselines, indicating that VisitHGNN captures the underlying structure of urban mobility with high fidelity. NDCG@50 in particular demonstrates effective probability ranking, while high top-1 accuracy and reflect prediction sharpness and explained variance.
6. Urban Analytics Applications and Domain Implications
The capacity of VisitHGNN to recover accurate, well-calibrated origin distributions for arbitrary POIs has several direct implications:
- Urban Planning and Land Use: Origin probability maps by POI can inform siting, accessibility analyses, intra-urban equity studies, and investment prioritization by clarifying neighborhood-level contributions to specific destinations.
 - Transportation Policy and Multimodal Design: Accurate origin distributions enable better demand modeling for public transit routing, mobility hub placement, and vehicular/pedestrian flow forecasting.
 - Public Health: Fine-grained estimation of neighborhood contributions to venues allows for targeted exposure/risk assessments, resource allocation, and tailored interventions in epidemiological planning.
 
A plausible implication is that such high-fidelity, distributional models can directly support scenario analysis, accessibility policy, and real-time mobility system management.
7. Position within the Graph Learning Landscape
VisitHGNN exemplifies an end-to-end, relation-specific heterogeneous graph neural network leveraging multi-modal feature fusion, attention across multiple edge types, and fully distributional calibration. Its approach aligns with modern trends in urban- and location-based AI: moving from descriptive analytics to precise, actionable probabilistic inference. Its performance evidences the value of integrating heterogeneous node and edge features, explicit spatial–temporal–functional relations, and robust calibration objectives within urban graph analytics.
This model situates itself among state-of-the-art techniques for spatial mobility modeling, advancing beyond simple pairwise distance- or attribute-based approaches by employing domain-aligned heterogeneous GNN principles and rigorously validated probabilistic outputs.