TrajMamba: Efficient Trajectory Learning
- TrajMamba is a dual-branch selective state-space model that fuses continuous GPS dynamics and discrete road data for efficient trajectory analysis.
- Its travel purpose-aware pre-training leverages textual embeddings and contrastive InfoNCE alignment to integrate semantic insights without increasing inference cost.
- The method employs learnable mask-based compression and knowledge distillation to eliminate redundant data, enhancing computational efficiency and prediction accuracy.
TrajMamba denotes a class of efficient, semantically rich methods for vehicle trajectory learning centered on the Traj-Mamba architecture—a dual-branch selective state-space model (SSM) incorporating GPS and road perspectives with specialized pre-training schemes for travel purpose integration and data reduction. TrajMamba is designed to extract movement patterns and embed travel semantics from vehicle GPS trajectories while optimizing for computational efficiency and robust generalization, making it well-suited for large-scale intelligent transportation applications (Liu et al., 20 Oct 2025).
1. Traj-Mamba Encoder Architecture
At the core of TrajMamba is the Traj-Mamba encoder, which jointly models both the continuous movement dynamics and the contextual semantics of vehicle trips:
- Dual-branch SSM design:
- The encoder accepts two types of input features for each trajectory:
- GPS perspective: raw spatial and temporal data, supplemented with high-order movement features such as instantaneous speed , acceleration , and turning angle computed for every timestamp.
- Road perspective: discrete identifiers of traversed road segments and encoded cyclic temporal features (e.g., hour of day, day of week).
- Architecture: Multiple Traj-Mamba blocks are stacked, each containing
- A GPS-SSM branch: input projection, causal convolution, and a selective SSM parameterized by movement features.
- A Road-SSM branch: analogous processing for road-related features.
- Input-dependent parameterization:
- Each SSM branch computes matrices , , and gating via learned projections from high-order features, e.g.,
where denotes the high-order feature sequence and is Softplus or SiLU.
Feature fusion:
The GPS and road latent embeddings are fused using a dot-product gating mechanism. For example, where and are the outputs from the respective SSM branches.
- Output trajectory embedding:
The final representation is the concatenation and mean-pooling of the fused outputs. This embedding robustly encodes both movement patterns and spatial context while maintaining linear complexity in trajectory length.
2. Travel Purpose-aware Pre-training
To enrich embeddings with travel purpose semantics without affecting inference cost, TrajMamba introduces a two-stage pre-training strategy:
Textual pre-training branches:
- For each trajectory, road segments and surrounding POIs are embedded using a shared pre-trained textual embedding model, applied to raw textual attributes (e.g., names, POI descriptions).
- Embeddings are contextually enriched through local aggregation and global context using learnable aggregation functions.
- Semantic view extraction:
Each road and POI view is summarized via dedicated Mamba blocks and mean pooling, yielding compressed views and that encode the trip’s geographic and functional semantics.
- Contrastive InfoNCE-based alignment:
The main trajectory embedding is aligned with both textual views using an InfoNCE loss (with learnable temperature ). This ensures that implicitly encodes the trip’s underlying purpose. -
Crucially, the text-based branches are only employed during pre-training. At inference, only the highly efficient TrajMamba encoder is used, incurring no extra cost for semantic integration.
3. Knowledge Distillation and Trajectory Compression
TrajMamba incorporates a knowledge distillation scheme to both identify key points in trajectories and compress them for fast, high-quality embedding:
- Rule-based preprocessing:
Candidate redundant or non-informative trajectory points (e.g., during vehicle idleness or constant-velocity travel) are pruned from the raw input.
- Learnable mask generation:
For the remaining points, a soft mask vector is learned via where are learnable parameters and is Gaussian noise during training to enforce robustness and sparsity.
- Compressed trajectory embedding:
The masked trajectory is passed through a new TrajMamba encoder (initialized from a travel purpose pre-trained teacher).
- Multi-view entropy coding (MEC) loss with mask penalty:
The distillation loss combines a MEC loss (aligning compressed and teacher embeddings) and a penalty that encourages sparsity in the mask:
This method ensures that only the most semantically discriminative trajectory points are retained, resulting in smaller, more informative embeddings (Liu et al., 20 Oct 2025).
4. Experimental Evaluation
Evaluation was conducted on two large-scale real-world taxi trajectory datasets (Chengdu and Xian) and three key downstream tasks:
- Destination Prediction (DP):
TrajMamba predicts both GPS coordinates and road segment endpoints from truncated trajectories. The method reduces GPS coordinate errors by up to 45% (Chengdu) and 26% (Xian) compared to the leading baseline JGRM. Road segment prediction accuracy gains are 9–10% over the same baseline.
- Arrival Time Estimation (ATE):
TrajMamba achieves the lowest mean absolute error (MAE) and mean percentage error (MAPE) of all compared methods.
- Similar Trajectory Search (STS):
Using cosine similarity of embeddings, TrajMamba achieves the highest Acc@1/Acc@5 and the lowest mean rank, indicating more meaningful trajectory representation.
The encoder’s computational efficiency is underscored by substantial reductions in embedding time and model size relative to Transformer-based models, readily supporting real-time deployment.
| Task | Improvement over JGRM | Notes |
|---|---|---|
| Destination Prediction | 45% (Chengdu, GPS error), 9–10% (road seg. acc.) | Lower error, higher acc. |
| Arrival Time Estimate | Lowest MAE, MAPE | Outperforms all baselines |
| Trajectory Search | Highest Acc@1/5 | Best mean rank |
5. Comparative Context and Distinctive Features
Within the landscape of trajectory representation learning and semantic trajectory analysis—spanning approaches such as RNNs, Transformers, trajectory2vec, and road-based contrastive learning—TrajMamba introduces several distinctive advances:
- SSM-based encoding:
The dual-branch Traj-Mamba SSM model supports linear time complexity, outperforming Transformer-based encoders on both accuracy and scalability.
- Travel purpose fusion:
The pre-training regimen achieves integration of textual travel purpose semantics without incurring inference-time cost, in contrast to models requiring heavy language modeling branches during prediction.
- Automated compression:
The learnable mask-based compression, guided by a knowledge distillation teacher, systematically reduces redundancy in dense, real-world GPS trajectories, which directly improves both computational efficiency and representation quality.
- Generalization and Transferability:
The resulting embeddings retain strong transferability across tasks, as shown by high performance in prediction, estimation, and search scenarios.
6. Applications and Implications
The design and empirical performance of TrajMamba have direct consequences for a gamut of intelligent transportation and urban mobility systems:
- Ride-hailing and mobility-on-demand:
Efficiently predicting destinations or travel times from partial trip data allows for optimized dispatch and dynamic pricing.
- Urban planning and analytics:
Rich travel purpose-aware embeddings support land-use inference, infrastructure design, and behavioral analysis.
- Real-time anomaly detection:
Compact, semantically meaningful trajectory representations facilitate large-scale streaming analysis for fraud, safety, or congestion detection.
- Flexible transfer to new tasks:
The approach is amenable to trajectory clustering, next-location recommendation, and new forms of multi-modal spatio-temporal querying.
A plausible implication is that the paradigm set by TrajMamba—dual perspective modeling, semantic-centric pre-training decoupled from inference, and end-to-end compression—may guide future architectures in trajectory intelligence, particularly as application scales and semantic complexity increase.
7. Summary Table: TrajMamba Workflow Components
| Component | Role | Computational Impact |
|---|---|---|
| Traj-Mamba Encoder | Dual GPS/road SSM feature aggregation | Linear in trajectory size |
| Travel Purpose Pre-training | Embeds semantics via contrastive infoNCE | Training-only (no inference cost) |
| Mask-based Compression | Redundant point removal/feature distillation | Lowered embedding time |
| Downstream Task Inference | Predicts endpoints, times, matches | Accelerated, accurate |
In summary, TrajMamba exemplifies an efficient, scalable, and semantically enriched trajectory analysis framework, validated by empirical results and distinct architectural design enabling practical deployment in large-scale intelligent mobility contexts (Liu et al., 20 Oct 2025).