Papers
Topics
Authors
Recent
Search
2000 character limit reached

GeoUNet: Topology-Aware Trajectory Synthesis

Updated 6 February 2026
  • GeoUNet is a UNet architecture that leverages geo-aware attention and RoadMAE embeddings to condition trajectory synthesis on road topology.
  • The network combines multi-scale convolution, residual blocks, and cross-attention to ensure generated trajectories obey real-world geographic constraints.
  • Empirical results show that GeoUNet outperforms baseline models in metrics like density, trip, and length errors, demonstrating superior fidelity and controllability.

GeoUNet is a UNet-shaped denoiser network tailored for conditional diffusion-based generation of geographic trajectories. It is the core component of ControlTraj, a controllable trajectory synthesis framework that incorporates road network topology and trip attributes to generate high-fidelity, human-directed trajectory data. GeoUNet integrates multi-scale convolutional features, geo-aware attention mechanisms, and residual connections, conditioned on embeddings of road topology (learned by a masked autoencoder, RoadMAE) and trip attributes, to guide reverse diffusion sampling and ensure that synthesized trajectories obey real-world geographic and topological constraints (Zhu et al., 2024).

1. Architecture of GeoUNet

GeoUNet adopts a symmetric UNet architecture characterized by a down-sampling path (encoder) and an up-sampling path (decoder), each comprising four hierarchical blocks:

  • Down-Sampling Path: Consists of 4 Geo-Down blocks. Each block implements:
    • Two ResNet sub-blocks (with GroupNorm, convolution, nonlinearity, and skip connection);
    • Geo-self-attention and geo-cross-attention layers for feature fusion;
    • Max-pooling for resolution reduction.
  • Up-Sampling Path: Comprises 4 Geo-Up blocks. Each block performs:
    • Nearest-neighbor or linear upsampling;
    • Two ResNet sub-blocks;
    • Geo-self and geo-cross-attention;
    • Skip connections from the corresponding down-sampling block, preserving multi-scale context.
  • Channel Progression: Channel dimension is set as {C,2C,4C,8C}\{C, 2C, 4C, 8C\} (with CC typically 32 or 64) along the encoder, and is mirrored during decoding.

GeoUNet’s distinctive feature is geo-attention fusion at every block. For each block, feature maps hiRL×dh^i \in \mathbb{R}^{L \times d} are updated via combined geo-self-attention (intra-feature) and geo-cross-attention (interacting with the control vector cc). The control vector cc concatenates the RoadMAE topological embedding zLz_L and the Wide-and-Deep trip attribute embedding zattrz_{\rm attr}: c=[zattr;zL].c = [z_{\rm attr};\,z_L]. Resulting attention outputs h~\tilde h are computed as: h~=softmax(QsKsd)Vs+softmax(QcKcd)Vc,\tilde h = \mathrm{softmax}\left(\frac{Q_s K_s^\top}{\sqrt d}\right)V_s + \mathrm{softmax}\left(\frac{Q_c K_c^\top}{\sqrt d}\right)V_c, where Qs,Ks,VsQ_s,K_s,V_s are the self-attention projections, and Qc,Kc,VcQ_c,K_c,V_c the cross-attention projections. This hierarchical fusion enables multi-scale contextual reasoning and direct topology injection at every resolution.

2. Topological Context Encoding via RoadMAE

GeoUNet leverages road network information through embeddings generated by RoadMAE, a masked transformer autoencoder trained on sequences of raw GPS points representing road segments. The processing pipeline includes:

  • Patchifying: Each road segment rR2×Lr \in \mathbb{R}^{2 \times L} is partitioned into N=L/PN = \lceil L/P \rceil patches, Patch(r)RN×2P\mathrm{Patch}(r) \in \mathbb{R}^{N \times 2P}, for fixed patch size PP.
  • Random Masking: A binary mask MM with ratio ror_o is applied, masking out patches during training for self-supervised reconstruction.
  • Transformer Encoder/Decoder: The encoder extracts zLRN×Dz_L \in \mathbb{R}^{N \times D} as the fine-grained topological embedding, while the decoder reconstructs masked input points to minimize the loss: Lssl=(rr~)M22.\mathcal{L}_{ssl} = \| (r - \tilde r) \odot M \|_2^2.

The resulting zLz_L (frozen at generation time) encapsulates road segment connectivity and geometry, enabling topology-aware trajectory synthesis in GeoUNet, without requiring explicit adjacency matrices or Laplacian regularization.

3. Conditional Diffusion Process

GeoUNet is employed as the denoising network in a conditional diffusion model for trajectory generation. The process is formulated as follows:

  • Forward Process (Noising): For real trajectory x0x_0,

q(x1:Tx0)=t=1Tq(xtxt1),q(xtxt1)=N(xt;1βtxt1,βtI).q(x_{1:T} \mid x_0) = \prod_{t=1}^T q(x_t \mid x_{t-1}), \quad q(x_t \mid x_{t-1}) = \mathcal{N}(x_t; \sqrt{1-\beta_t} x_{t-1}, \beta_t I).

By reparameterization: xt=αˉtx0+1αˉtϵ,αˉt=i=1t(1βi),  ϵN(0,I).x_t = \sqrt{\bar\alpha_t} x_0 + \sqrt{1-\bar\alpha_t} \epsilon,\qquad \bar\alpha_t = \prod_{i=1}^t (1 - \beta_i),\; \epsilon \sim \mathcal{N}(0,I).

  • Reverse Process (Denoising, Conditioned on cc): pθ(x0:T1xT,c)=t=1Tpθ(xt1xt,c),p_\theta(x_{0:T-1}\mid x_T,c) = \prod_{t=1}^T p_\theta(x_{t-1} \mid x_t, c), where

pθ(xt1xt,c)=N(xt1;μθ(xt,t,c),σθ(xt,t,c)2I).p_\theta(x_{t-1}\mid x_t, c) = \mathcal{N}\big(x_{t-1}; \mu_\theta(x_t, t, c), \sigma_\theta(x_t, t, c)^2 I\big).

The mean is parameterized as

μθ(xt,t,c)=1αt(xtβt1αˉtϵθ(xt,t,c)),\mu_\theta(x_t, t, c) = \frac{1}{\sqrt{\alpha_t}}\left(x_t - \frac{\beta_t}{\sqrt{1-\bar\alpha_t}} \epsilon_\theta(x_t, t, c)\right),

with σθ(xt,t,c)=β~t\sigma_\theta(x_t, t, c) = \sqrt{\tilde\beta_t}. The noise ϵθ(xt,t,c)\epsilon_\theta(x_t, t, c) is estimated by GeoUNet.

4. Training and Inference Procedures

The training of GeoUNet in the ControlTraj framework proceeds as:

  • RoadMAE Pretraining: The RoadMAE autoencoder is pretrained via Lssl\mathcal{L}_{ssl} and then weights are frozen for downstream trajectory synthesis.
  • Diffusion Model Training: For each sampled real trajectory x0x_0 and random time step tt,

    1. Compute xt=αˉtx0+1αˉtϵx_t = \sqrt{\bar\alpha_t}x_0 + \sqrt{1-\bar\alpha_t}\epsilon.
    2. Use GeoUNet to predict ϵ^=ϵθ(xt,t,c)\hat\epsilon = \epsilon_\theta(x_t, t, c).
    3. Minimize the mean squared error:

    Ldiff(θ)=Ex0,t,ϵ,cϵϵθ(xt,t,c)22.\mathcal{L}_{diff}(\theta) = \mathbb{E}_{x_0, t, \epsilon, c} \| \epsilon - \epsilon_\theta(x_t, t, c) \|_2^2.

  • Total Loss: When end-to-end, the total loss combines diffusion and RoadMAE terms: Ltotal=Ldiff+λLssl,\mathcal{L}_{total} = \mathcal{L}_{diff} + \lambda \mathcal{L}_{ssl}, though typically only Ldiff\mathcal{L}_{diff} is used during GeoUNet training as RoadMAE is frozen.
  • Hyperparameters: Typical values are learning rate 1×1041 \times 10^{-4}, batch size 1024, T=500T=500 diffusion steps (linear β\beta schedule 1×1041\times 10^{-4} to 5×1025\times 10^{-2}), skip steps = 5 (for DDIM acceleration), and embedding dimension d=128d=128.
  • Sampling: At inference, embeddings zattrz_{\rm attr} and zLz_L are computed from trip attributes and road segments, concatenated as cc, and provided to GeoUNet for denoising sampled white noise xTx_T. DDIM acceleration is supported by skipping steps.

5. Empirical Evaluation and Performance

Experiments were conducted on trajectory data from Chengdu (5.7M trips), Xi’an (3.0M), and Porto (1.7M). Data was preprocessed to standardize trajectory lengths (filtering, interpolation, truncation) and to extract trip attributes.

  • Evaluation Metrics: All metrics use Jensen–Shannon divergence (JSD) to compare generated to real data distributions:

    1. Density error: spatial coverage in gridded space.
    2. Trip error: distribution of start/end points.
    3. Length error: trajectory distance distribution.
  • Baselines: VAE, TrajGAN, DP-TrajGAN, DiffWave, DiffTraj, as well as ControlTraj variants without conditioning or with vanilla autoencoder rather than RoadMAE.

  • Results (Chengdu):
    • Density error: DiffTraj 0.0066 vs. ControlTraj 0.0039.
    • Trip error: DiffTraj 0.0143 vs. ControlTraj 0.0106.
    • Length error: DiffTraj 0.0174 vs. ControlTraj 0.0117.
  • Downstream Utility: Generated and real data yield less than 5% difference in RMSE/MAE/MAPE for ASTGCN, GWNet, MTGNN, DCRNN-based traffic-flow forecasting.
  • Controllability: When supplied a prescribed route (sequence of road segments), ControlTraj/GeoUNet strictly follows the intended topology, outperforming unconditional diffusion models.
  • Generalizability: GeoUNet exhibits strong zero-shot transfer: training on Chengdu and testing on Xi’an yields a density error of 0.0171 (ControlTraj), compared to 0.0806 (DiffTraj) and 0.0544 (ControlTraj-AE).
  • Qualitative Outputs: Visualization includes geo-plots of trajectories, rush-hour and trip-volume heatmaps, and assessments of RoadMAE masking impacts (0–75%).

6. Significance, Limitations, and Context

GeoUNet advances trajectory data synthesis by melding deep convolutional denoisers with explicit topology- and attribute-based conditioning, enabled by architectural innovations in geo-attention and by leveraging a robust transformer-based autoencoder for fine-grained topology embedding. Its ability to tightly control generated outcomes with respect to specified routes, attributes, and road network context surpasses prior approaches that lack such integrated conditioning or suffer from degraded fidelity and transferability in novel geographic environments.

A plausible implication is that GeoUNet’s method of indirect topology injection (via cross-attention rather than explicit Laplacian regularization) offers superior scalability and generalization, though it is possible that explicit relational constraints might be preferred for certain graph-structured domains outside urban mobility.

GeoUNet currently relies on frozen RoadMAE embeddings, which suggests constraints on adaptability to evolving road networks or dynamic topologies; end-to-end fine-tuning or joint pretraining strategies may be explored in future work. Finally, while the current attention mechanisms encode descriptive context, they may not enforce strict physical infeasibility constraints (e.g., preventing illegal trajectory transitions)—this is an area for additional research if stricter guarantees are necessary (Zhu et al., 2024).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to GeoUNet.