Auckland Harbour Bridge Anomaly Dataset

Updated 1 December 2025

Auckland Harbour Bridge Dataset is a curated collection of 6,000 HD video frames capturing both real 'Pin_OK' and synthetic 'Pin_OUT' instances for structural anomaly analysis.
It employs SIFT-based feature extraction combined with spiking neural networks to achieve high accuracy (92.3%) and low latency for real-time anomaly detection.
The dataset integrates diverse environmental conditions and rigorous synthetic augmentation, offering actionable insights for traffic infrastructure safety and predictive maintenance.

The Auckland Harbour Bridge (AHB) Dataset is a curated, annotated collection of 6,000 high-definition video frames designed for real-time, low-latency detection of structural anomalies in traffic flow-control infrastructure. Developed for the analysis of the safety integrity of metal pins linking movable concrete barrier (MCB) segments on the Auckland Harbour Bridge, the dataset is foundational to neuromorphic and vision-based approaches for rapid anomaly classification under operationally diverse environmental conditions (Rathee et al., 26 Nov 2025).

1. Dataset Composition and Annotation

The AHB dataset comprises 6,000 frames, stratified into 4,500 real-world "Pin_OK" (safe; correctly seated pin) and 1,500 "Pin_OUT" (unsafe; dislodged pin) instances, maintaining a 3:1 class ratio to reflect the scarcity of failures in operational data. Frames were originally captured at up to 120 fps (1920 × 1080 px), then uniformly downsampled to 30 fps for computational efficiency. Each frame is cropped to a fixed Region of Interest (ROI) around a monitored pin, with ROI dimensions tailored to ensure inclusion of the pin and its immediate substrate.

Two label categories are provided:

Pin_OK: Annotated by manual review of video from live Barrier Transfer Machine (BTM) operations.
Pin_OUT: Generated via digital synthesis—manually editing real Pin_OK frames in Adobe Photoshop to simulate displaced pins, followed by augmentation.

2. Data Acquisition Protocol

A multi-sensor camera rig, comprising GoPro cameras, iPhone 13 Pro, Samsung A7, Apple iPad 6, and an external power source, was mounted on the BTM. This configuration enabled synchronized, multi-angle video capture concentrated on pin locations. Data collection encompassed a range of weather and lighting scenarios, including low-light (dawn/dusk), drizzle, and overcast conditions. Live traffic was always present to ensure authentic environmental confounders (e.g., shadows, glare, and specular reflections).

Raw video was captured at 120 fps and subsequently downsampled to 30 fps. For each monitored pin, a manually fixed ROI was extracted from every frame, standardizing input for further processing.

3. Synthetic Augmentation Methodology

Unsafe ("Pin_OUT") instances are inherently scarce in operational settings. To address this, a base set of 120 real Pin_OK frames underwent digital editing for synthetic pin removal and overlay of displaced-pin templates. Augmentation strategies included:

Geometric transforms: Rotation (±10°), perspective warping, positional jitter (±5 px);
Photometric adjustments: Gamma shifts, illumination distribution matched to real scenarios;
Occlusion imposition: Overlays simulating oil stains, water droplets, or transient occluders;
Morphological changes: Scaling artifacts of ±5% to simulate subtle shape variance.

Expanding this base resulted in 1,500 diverse Pin_OUT samples, providing robust coverage for rare-event generalization.

4. Preprocessing and Feature Encoding Pipeline

Processing begins with grayscale conversion, histogram equalization, and pixel normalization (zero-mean, unit-variance) of each ROI. A Scale-Invariant Feature Transform (SIFT) pipeline is then applied:

Scale-space extrema detection: Gaussian scale-space $\displaystyle L(x,y,σ) = G(x,y,σ) * I(x,y)$ is computed, with $G$ the Gaussian kernel and $I$ the input image. Difference of Gaussians approximates the normalized Laplacian $D(x,y,σ) = L(x,y,kσ) − L(x,y,σ)$ for keypoint selection.
Keypoint extraction/descriptors: Up to $N=100$ keypoints per ROI (zero-padded if necessary) are retained, with each yielding a 128-dimensional descriptor. Concatenation produces a fixed-length 12,800-dimensional frame-level feature.
Descriptor normalization: L₂ normalization is applied for numerical stability, $\displaystyle \mathbf{f}_{\text{norm}} = \frac{\mathbf{f}}{\|\mathbf{f}\|_2 + \varepsilon}$ .

5. Spike Encoding and Spiking Neural Network Input

Feature vectors are mapped to spiking neural activity using a latency-driven encoding. Each normalized descriptor value $x_i\in[0,1]$ is mapped to a spike time $t_i = T\,(1 - x_i)$ within a 100 ms window ( $T = 100\,\mathrm{ms}$ ), implementing time-to-first-spike coding whereby more salient features (higher $x_i$ ) trigger earlier spikes. This produces one spike per descriptor dimension across 12,800 channels, with an average spike activity density of 8.1%. Inputs are subsequently processed by Leaky Integrate-and-Fire (LIF) neuron models, where membrane potential obeys

$\tau_m \frac{du(t)}{dt} = -u(t) + R\,I(t)$

and threshold crossing triggers spike emission and potential reset.

6. Dataset Splits, Evaluation Metrics, and Baseline Comparisons

The dataset is partitioned into training (4,200 frames; 70%), validation (900 frames; 15%), and testing (900 frames; 15%), stratified by class. Key evaluation metrics include:

Accuracy: $\mathrm{Acc} = \frac{\mathrm{TP} + \mathrm{TN}}{\mathrm{TP} + \mathrm{TN} + \mathrm{FP} + \mathrm{FN}}$
Precision/Recall and F₁-score for Pin_OUT class
Inference latency: Average processing time per frame, both GPU (RTX 4060) and CPU
Spike activity density across the simulation window

Reported results for the SIFT-SNN pipeline are summarized below.

Model	Test Acc. (%)	F₁ (%)	GPU Latency (ms)	CPU Latency (ms)	Spike Activity (%)
SIFT-SNN	92.3 ± 0.8	91.0	9.5	~26	8.1
ResNet-50	95.1	—	85	—	—
MobileNetV2	91.2	—	42	—	—

7. Generalization, Deployment, and Availability

Performance is sustained across adverse conditions (drizzle, low-light, overcast); synthetic augmentation demonstrably supports generalization to rare-event cases. However, validation under novel conditions such as heavy rain or pronounced night-time glare has yet to be performed.

The sub-10 ms per-frame inference and low spike activity are conducive to real-time deployment on embedded platforms, including neuromorphic hardware (e.g., Intel Loihi or FPGAs). Dataset access and SIFT-SNN codebase are available from the authors ([email protected]) for research under Auckland University of Technology’s data-sharing policy (Rathee et al., 26 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Hybrid SIFT-SNN for Efficient Anomaly Detection of Traffic Flow-Control Infrastructure (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Auckland Harbour Bridge Dataset.