OMTAD: Open Maritime Traffic Dataset
- OMTAD is a public dataset offering detailed vessel trajectory analytics, density mapping, and anomaly detection benchmarks using AIS data.
- It covers multiple regions like the Baltic Sea and Western Australia with rigorous rule-based data cleansing, interpolation, and segmentation pipelines.
- The dataset supports diverse applications including port detection and spatio-temporal graph anomaly analysis, enabling robust research in maritime monitoring.
The Open Maritime Traffic Analysis Dataset (OMTAD) is a public repository of maritime vessel trajectories and derived analytics sourced primarily from open Automatic Identification System (AIS) positioning data in global coastal and offshore domains. OMTAD supports detailed quantitative assessments of vessel movements, traffic densities, port activities, and is foundational for benchmarking anomaly detection algorithms in irregular, non-grid spatio-temporal graph settings (Kim et al., 23 Dec 2025, Hütten, 28 Nov 2025).
1. Geographic and Temporal Coverage
OMTAD is available for multiple regions and periods, notably:
- Baltic Sea (coastal): 91 days, July 29–October 27, 2024, covering longitude 9°–32° E and latitude 53°–66° N. Transit “gateways” are defined for boundary tracking and vessel flux estimation (Hütten, 28 Nov 2025).
- Western Australia (offshore): January 2018–December 2020, spanning longitude 105°–116° E and latitude 36°–15° S (Kim et al., 23 Dec 2025).
- Vessel Types and Trajectories (Western Australia):
| Vessel type | Number of trajectories | |-------------|-----------------------| | Cargo | 14 384 | | Tanker | 4 020 | | Fishing | 466 | | Passenger | 254 | | Total | 19 124 |
Typical spatial resolution for density grids is approximately 400 m. Baltic region analyses rely on AIS-A feeds from terrestrial receiver networks; the Australian dataset comprises both raw and interpolated AIS tracks with full kinematics.
2. Data Processing and Feature Engineering
OMTAD cleansing and trajectory modeling are rule-based, following multi-stage pipelines:
- Initial Filtering: Removal of land-based messages and inactive vessels (those confined to a 400 m square over the entire analysis period).
- Movement Segmentation: Division of time series when speed < 0.5 kn (stationary threshold), time gaps > 48 h, spatial jumps > 750 km, or boundary crossings.
- Outlier Removal: Iterative exclusion where inferred speed > 50 kn, |acceleration| > 1 m/s², or near-duplicate “loiter” records (within 5 s at < 1 m, speed < 1 km/h).
- Trajectory Assembly and Interpolation (offshore): Tracks are aggregated by vessel ID (MMSI) and time, filtering geodesic distance jumps > 5 km, linear interpolation for regularization (e.g., every 10 min).
- Segmentation for Modeling: Sliding windows (e.g., of length = 24 h) enable downstream temporal or graph-based anomaly tasks.
Feature Structure (per record):
- MMSI (ID), timestamp (UTC), latitude, longitude, speed (kn, SOG), heading (COG), movement state (moving/stationary).
- Derived kinematics: , .
- Optional bins: environmental conditions (wind, wave, current, visibility).
3. Journey and Density Models
OMTAD supplies analytical products enabling maritime domain studies:
- Route Simplification: Ramer–Douglas–Peucker applied to movement segments, retaining “waypoints” with angular deviation , constraining deviation ≤ 100 m.
- Speed Control Modeling: Control points inserted where relative speed changes ≥ 5%, followed by linear temporal interpolation.
- Journey Construction: Each vessel’s sequential movements (moving, stationary, at gateways) composes a “journey,” employing empirical time thresholds to distinguish true area exits (e.g., with ).
- Density Estimation: Gridded vessel density maps at ~400 m resolution:
where yields mean vessels per km² per cell.
- Port Detection and Transit Fluxes: Watershed clustering on density maps (threshold ≥ 0.5 vessels/km²), automated port boundary and centroid assignment, entries/exits quantified per gateway.
Uncertainty assessment incorporates incomplete coverage, false gateway assignments, missing AIS-B (assumed 30%), and “dark” vessels (~21%). Vessel counts bounds:
4. Spatio-Temporal Graph Anomaly Benchmark
The OMTAD extension for anomaly detection utilizes dynamic graph construction for non-grid environments (Kim et al., 23 Dec 2025):
- Graph Structure: For each window , the graph comprises:
- Nodes: (vessel at ), feature vector .
- Spatial edges : if ; often set to 500 m.
- Temporal edges : .
- Adjacency matrix in block structure for time windows.
- Optional application of spatial Gaussian kernel edge weights.
Anomaly Types Supported:
- Node-level: Individual state anomalies, e.g., speed/heading excursions beyond of normal drift, spoofed AIS.
- Edge-level: Pairwise vessel interactions reflecting abnormal proximity (loitering, rendezvous).
- Graph-level: Windows flagged as abnormal if any constituent node/edge is perturbed.
Augmentation and Injection Pipelines:
- Trajectory Synthesizer: Collects real neighbors within distance and generates virtual ones to ensure minimum graph connectivity (sampling lat/lon perturbations).
- Anomaly Injector: Prompt-driven node anomaly injection ratio ; edge-level and textual prompt engineering (e.g., for “loitering”), window-level labeling.
5. Evaluation Protocol, Metrics, and Baselines
The benchmark defines rigorous task-oriented protocols:
- Node anomaly detection: Predict anomaly mask .
- Edge anomaly detection: Label edges as normal or abnormal.
- Graph anomaly detection: Classify window as normal/anomalous.
Pipeline:
- Graph construction and preprocessing as detailed above.
- Model training on mixtures of normal/injected samples, stratified into train/validation/test splits.
- Evaluation on held-out windows with controlled node and trajectory anomaly rates.
Metrics:
- Precision
- Recall
- F1-score
- ROC-AUC
Baseline Methods:
- LSTM, Transformer (temporal only)
- LSTM+GNN, Transformer+GNN (spatio-temporal hybrid)
- Finding: Hybrid models consistently outperform purely temporal approaches, particularly under low global anomaly prevalence regimes.
6. Access Methods, Usage Scenarios, and Licensing
Available Data Structures:
- trajectories.csv: Vessel track details by segment and timestamp ( 25 MB).
- vessels_summary.csv: MMSI, vessel type (IMO if available), attributes ( 2 MB).
- counts_time_series.csv: Time-series for global and partitioned vessel counts ( 1 MB).
- density_grid.tif: GeoTIFF map, all/moving/stationary bands.
- port_areas.shp/.dbf/.shx: GIS vector definition for port areas.
- gateway_fluxes.csv: Quantified entry/exit fluxes at boundaries.
Access and Licensing:
- Download from Figshare: https://doi.org/10.6084/m9.figshare.29062715
- Processing and Python/GIS examples: https://github.com/grid-inc/OMTAD
- Creative Commons Attribution (CC BY 4.0): full rights to copy, redistribute, adapt, with required citation (Hütten, 28 Nov 2025).
Usage Examples:
- Hourly statistical analyses of moving/stationary vessels via time series.
- GIS mapping of density grids to resolve shipping lanes, anchorages, and port locations.
- Validation of gateway flux against official statistics.
- Overlay with marine protected zones for environmental exposure analysis.
7. Challenges, Best Practices, and Future Directions
OMTAD highlights key technical challenges in maritime traffic analysis and anomaly detection:
- Absence of fixed spatial anchors necessitates dynamic graph construction derived directly from vessel trajectories.
- Data are inherently sparse, irregularly sampled, and must be resampled or interpolated for analytic consistency.
- Anomalies manifest at multiple granularities—from kinematic outliers to coordinated group behavior—necessitating multifaceted benchmarks.
- This suggests comprehensive model evaluations at node, edge, and graph levels are essential for robust anomaly characterization.
Best Practices:
- Use clustering (e.g., OPTICS), kernel-based neighbor thresholds for spatial edge derivation.
- Augment sparse temporal regions with synthetic context to ensure sufficient graph density.
- Control anomaly injection ratios for balanced evaluation and model ablation.
- Employ prompt-driven anomaly engines to extend analysis beyond basic kinematic deviations.
OMTAD, in both its original and extended (benchmark) forms, establishes a reproducible, empirically grounded framework for maritime trajectory analytics, density mapping, and graph-based anomaly detection, advancing rigorous research in open-sea monitoring and spatio-temporal modeling (Kim et al., 23 Dec 2025, Hütten, 28 Nov 2025).