Papers
Topics
Authors
Recent
Search
2000 character limit reached

OMTAD: Open Maritime Traffic Dataset

Updated 30 December 2025
  • OMTAD is a public dataset offering detailed vessel trajectory analytics, density mapping, and anomaly detection benchmarks using AIS data.
  • It covers multiple regions like the Baltic Sea and Western Australia with rigorous rule-based data cleansing, interpolation, and segmentation pipelines.
  • The dataset supports diverse applications including port detection and spatio-temporal graph anomaly analysis, enabling robust research in maritime monitoring.

The Open Maritime Traffic Analysis Dataset (OMTAD) is a public repository of maritime vessel trajectories and derived analytics sourced primarily from open Automatic Identification System (AIS) positioning data in global coastal and offshore domains. OMTAD supports detailed quantitative assessments of vessel movements, traffic densities, port activities, and is foundational for benchmarking anomaly detection algorithms in irregular, non-grid spatio-temporal graph settings (Kim et al., 23 Dec 2025, Hütten, 28 Nov 2025).

1. Geographic and Temporal Coverage

OMTAD is available for multiple regions and periods, notably:

  • Baltic Sea (coastal): 91 days, July 29–October 27, 2024, covering longitude 9°–32° E and latitude 53°–66° N. Transit “gateways” are defined for boundary tracking and vessel flux estimation (Hütten, 28 Nov 2025).
  • Western Australia (offshore): January 2018–December 2020, spanning longitude 105°–116° E and latitude 36°–15° S (Kim et al., 23 Dec 2025).
  • Vessel Types and Trajectories (Western Australia):

| Vessel type | Number of trajectories | |-------------|-----------------------| | Cargo | 14 384 | | Tanker | 4 020 | | Fishing | 466 | | Passenger | 254 | | Total | 19 124 |

Typical spatial resolution for density grids is approximately 400 m. Baltic region analyses rely on AIS-A feeds from terrestrial receiver networks; the Australian dataset comprises both raw and interpolated AIS tracks with full kinematics.

2. Data Processing and Feature Engineering

OMTAD cleansing and trajectory modeling are rule-based, following multi-stage pipelines:

  • Initial Filtering: Removal of land-based messages and inactive vessels (those confined to a 400 m square over the entire analysis period).
  • Movement Segmentation: Division of time series when speed < 0.5 kn (stationary threshold), time gaps > 48 h, spatial jumps > 750 km, or boundary crossings.
  • Outlier Removal: Iterative exclusion where inferred speed > 50 kn, |acceleration| > 1 m/s², or near-duplicate “loiter” records (within 5 s at < 1 m, speed < 1 km/h).
  • Trajectory Assembly and Interpolation (offshore): Tracks are aggregated by vessel ID (MMSI) and time, filtering geodesic distance jumps > 5 km, linear interpolation for regularization (e.g., every 10 min).
  • Segmentation for Modeling: Sliding windows (e.g., of length ww = 24 h) enable downstream temporal or graph-based anomaly tasks.

Feature Structure (per record):

  • MMSI (ID), timestamp (UTC), latitude, longitude, speed (kn, SOG), heading (COG), movement state (moving/stationary).
  • Derived kinematics: ΔSOG/Δt\Delta\mathrm{SOG}/\Delta t, ΔCOG/Δt\Delta\mathrm{COG}/\Delta t.
  • Optional bins: environmental conditions (wind, wave, current, visibility).

3. Journey and Density Models

OMTAD supplies analytical products enabling maritime domain studies:

  • Route Simplification: Ramer–Douglas–Peucker applied to movement segments, retaining “waypoints” with angular deviation ϵ\epsilon, constraining deviation ≤ 100 m.
  • Speed Control Modeling: Control points inserted where relative speed changes ≥ 5%, followed by linear temporal interpolation.
  • Journey Construction: Each vessel’s sequential movements (moving, stationary, at gateways) composes a “journey,” employing empirical time thresholds to distinguish true area exits (e.g., tthr=t0(max(vexit,ventry)/10kn)4t_{\rm thr} = t_0\Bigl(\max(v_{\rm exit},v_{\rm entry})/10\,{\rm kn}\Bigr)^{-4} with t0=6ht_0=6\,\mathrm{h}).
  • Density Estimation: Gridded vessel density maps at ~400 m resolution:

ρi=1AcellTv0T1(xv(t)celli)dt\rho_i = \frac{1}{A_{\rm cell}\,T}\sum_v\int_0^T \mathbf{1}\bigl(x_v(t)\in{\rm cell}_i\bigr)dt

where ρi\rho_i yields mean vessels per km² per cell.

  • Port Detection and Transit Fluxes: Watershed clustering on density maps (threshold ≥ 0.5 vessels/km²), automated port boundary and centroid assignment, entries/exits quantified per gateway.

Uncertainty assessment incorporates incomplete coverage, false gateway assignments, missing AIS-B (assumed 30%), and “dark” vessels (~21%). Vessel counts bounds:

(ΔN+)2=(NdfNhi)2+(δ~dark2+δ~aisB2)Ndf2,ΔN=NdfNlow(\Delta N^+)^2 = (N_{\rm df}-N_{\rm hi})^2 + (\tilde\delta_{\rm dark}^2 + \tilde\delta_{\rm aisB}^2) N_{\rm df}^2, \quad \Delta N^- = N_{\rm df}-N_{\rm low}

4. Spatio-Temporal Graph Anomaly Benchmark

The OMTAD extension for anomaly detection utilizes dynamic graph construction for non-grid environments (Kim et al., 23 Dec 2025):

  • Graph Structure: For each window [t0,t1][t_0,t_1], the graph G=(V,Es,Et)\mathcal{G} = (\mathcal{V}, \mathcal{E}^s, \mathcal{E}^t) comprises:
    • Nodes: vi,tVv_{i,t} \in \mathcal{V} (vessel ii at tt), feature vector xi,t\mathbf{x}_{i,t}.
    • Spatial edges Es\mathcal{E}^s: Aijs(t)=1A^s_{ij}(t) = 1 if pi,tpj,tε\|\mathbf{p}_{i,t}-\mathbf{p}_{j,t}\| \le \varepsilon; ε\varepsilon often set to 500 m.
    • Temporal edges Et\mathcal{E}^t: Ait(t,t+1)=1A^t_i(t, t+1) = 1.
    • Adjacency matrix A\mathbf{A} in block structure for time windows.
    • Optional application of spatial Gaussian kernel edge weights.

Anomaly Types Supported:

  • Node-level: Individual state anomalies, e.g., speed/heading excursions beyond 3σ3\sigma of normal drift, spoofed AIS.
  • Edge-level: Pairwise vessel interactions reflecting abnormal proximity (loitering, rendezvous).
  • Graph-level: Windows flagged as abnormal if any constituent node/edge is perturbed.

Augmentation and Injection Pipelines:

  • Trajectory Synthesizer: Collects real neighbors within distance δ\delta and generates virtual ones to ensure minimum graph connectivity (sampling SOG,COG,\mathrm{SOG}', \mathrm{COG}', lat/lon perturbations).
  • Anomaly Injector: Prompt-driven node anomaly injection ratio rnoder_{\text{node}}; edge-level and textual prompt engineering (e.g., for “loitering”), window-level labeling.

5. Evaluation Protocol, Metrics, and Baselines

The benchmark defines rigorous task-oriented protocols:

  • Node anomaly detection: Predict anomaly mask zi,t{0,1}z_{i,t} \in \{0,1\}.
  • Edge anomaly detection: Label edges (i,j,t)(i, j, t) as normal or abnormal.
  • Graph anomaly detection: Classify window as normal/anomalous.

Pipeline:

  • Graph construction and preprocessing as detailed above.
  • Model training on mixtures of normal/injected samples, stratified into train/validation/test splits.
  • Evaluation on held-out windows with controlled node and trajectory anomaly rates.

Metrics:

  • Precision P=TPTP+FPP = \frac{TP}{TP + FP}
  • Recall R=TPTP+FNR = \frac{TP}{TP + FN}
  • F1-score F1=2P×RP+RF1 = 2\frac{P \times R}{P+R}
  • ROC-AUC AUC=01TPR(f)dFPR(f)\mathrm{AUC} = \int_0^1 \mathrm{TPR}(f) d\mathrm{FPR}(f)

Baseline Methods:

  • LSTM, Transformer (temporal only)
  • LSTM+GNN, Transformer+GNN (spatio-temporal hybrid)
  • Finding: Hybrid models consistently outperform purely temporal approaches, particularly under low global anomaly prevalence regimes.

6. Access Methods, Usage Scenarios, and Licensing

Available Data Structures:

  • trajectories.csv: Vessel track details by segment and timestamp (\approx 25 MB).
  • vessels_summary.csv: MMSI, vessel type (IMO if available), attributes (\approx 2 MB).
  • counts_time_series.csv: Time-series for global and partitioned vessel counts (\approx 1 MB).
  • density_grid.tif: GeoTIFF map, all/moving/stationary bands.
  • port_areas.shp/.dbf/.shx: GIS vector definition for port areas.
  • gateway_fluxes.csv: Quantified entry/exit fluxes at boundaries.

Access and Licensing:

Usage Examples:

  • Hourly statistical analyses of moving/stationary vessels via time series.
  • GIS mapping of density grids to resolve shipping lanes, anchorages, and port locations.
  • Validation of gateway flux against official statistics.
  • Overlay with marine protected zones for environmental exposure analysis.

7. Challenges, Best Practices, and Future Directions

OMTAD highlights key technical challenges in maritime traffic analysis and anomaly detection:

  • Absence of fixed spatial anchors necessitates dynamic graph construction derived directly from vessel trajectories.
  • Data are inherently sparse, irregularly sampled, and must be resampled or interpolated for analytic consistency.
  • Anomalies manifest at multiple granularities—from kinematic outliers to coordinated group behavior—necessitating multifaceted benchmarks.
  • This suggests comprehensive model evaluations at node, edge, and graph levels are essential for robust anomaly characterization.

Best Practices:

  • Use clustering (e.g., OPTICS), kernel-based neighbor thresholds for spatial edge derivation.
  • Augment sparse temporal regions with synthetic context to ensure sufficient graph density.
  • Control anomaly injection ratios for balanced evaluation and model ablation.
  • Employ prompt-driven anomaly engines to extend analysis beyond basic kinematic deviations.

OMTAD, in both its original and extended (benchmark) forms, establishes a reproducible, empirically grounded framework for maritime trajectory analytics, density mapping, and graph-based anomaly detection, advancing rigorous research in open-sea monitoring and spatio-temporal modeling (Kim et al., 23 Dec 2025, Hütten, 28 Nov 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Open Maritime Traffic Analysis Dataset (OMTAD).