Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos (2004.03044v1)

Published 6 Apr 2020 in cs.CV

Abstract: Video anomaly detection (VAD) has been extensively studied. However, research on egocentric traffic videos with dynamic scenes lacks large-scale benchmark datasets as well as effective evaluation metrics. This paper proposes traffic anomaly detection with a \textit{when-where-what} pipeline to detect, localize, and recognize anomalous events from egocentric videos. We introduce a new dataset called Detection of Traffic Anomaly (DoTA) containing 4,677 videos with temporal, spatial, and categorical annotations. A new spatial-temporal area under curve (STAUC) evaluation metric is proposed and used with DoTA. State-of-the-art methods are benchmarked for two VAD-related tasks.Experimental results show STAUC is an effective VAD metric. To our knowledge, DoTA is the largest traffic anomaly dataset to-date and is the first supporting traffic anomaly studies across when-where-what perspectives. Our code and dataset can be found in: https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly

Citations (42)

Summary

  • The paper introduces the DoTA dataset with 4,677 annotated driving videos that use a novel When-Where-What pipeline for anomaly detection.
  • The paper proposes STAUC, a spatio-temporal metric that surpasses traditional AUC by evaluating both temporal accuracy and spatial localization.
  • The paper benchmarks unsupervised and supervised methods, revealing that integrating frame- and object-level strategies enhances advanced driver assistance systems.

Analysis and Insights on "When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos"

The paper presented by Yao et al. addresses an understudied yet critical aspect of video anomaly detection, specifically within the context of egocentric traffic videos. This work introduces the Detection of Traffic Anomaly (DoTA) dataset as a significant resource, encapsulating 4,677 videos with comprehensive annotations to foster advancements in detecting, localizing, and classifying traffic anomalies. This contribution addresses the scarcity of large-scale benchmark datasets that could facilitate the development of more robust advanced driver assistance systems (ADAS).

Dataset and Methodological Contributions

The DoTA dataset stands out by providing not only the largest collection of egocentric traffic videos to date but also by integrating a When-Where-What pipeline for annotations:

  • When: Temporal annotations delineate anomaly start and end times.
  • Where: Spatial annotations utilize bounding box tracklets to highlight anomalous objects within video frames.
  • What: 18 categorical labels classify anomalies, considering both ego-involved and non-ego discrepancies.

Accompanying the dataset is a critique of existing evaluation metrics, specifically the Area Under Curve (AUC), which is traditionally used but may inadequately account for the spatial localization of anomalies. To resolve this, the authors propose a novel Spatio-temporal Area Under Curve (STAUC) metric, which integrates spatial overlap in performance assessment, thus providing a more comprehensive evaluation of anomaly detection algorithms.

Benchmarking and Experimental Insights

The authors benchmark a range of both unsupervised and supervised methods using the DoTA dataset, including state-of-the-art techniques in video anomaly detection (VAD) and video action recognition (VAR):

  • Unsupervised VAD: The paper examines methodologies such as ConvAE, ConvLSTMAE, AnoPred, and TAD+ML, highlighting an innovative ensemble approach that combines frame-level and object-centric strategies to enhance performance metrics like AUC and STAUC.
  • Supervised VAD/VAR: For supervised contexts, techniques such as Temporal Recurrent Networks (TRN) are evaluated, demonstrating higher sensitivity to the nuances of temporal data inherent in traffic scenarios.

Results underscore that while high AUC scores may misleadingly suggest method efficacy in temporal anomaly detection, STAUC provides a more nuanced analysis by incorporating spatial precision, thus encouraging model development that better localizes anomalies.

Implications and Future Directions

The introduction of DoTA and STAUC offers a structured pathway to refine egocentric traffic video analysis. By distinguishing between where an anomaly occurs and its duration, DoTA supports a more jointed understanding of traffic behaviors and irregularities, enabling the development of autonomous systems with improved situational awareness. Moreover, the paper accentuates the need for models that effectively integrate contextual, spatial, and motion-based features for more precise anomaly detection.

Future research could explore the refinement of ensemble learning strategies that optimize both temporal and spatial dimensions, potentially employing advanced architectures like transformers or leveraging multimodal data inputs (e.g., integrating audio-visual signals). Additionally, expanding dataset diversity with respect to environmental conditions and anomaly types will help further generalize the findings across different driving environments.

In summary, the paper presents a pivotal contribution to advancing anomaly detection in driving scenarios, laying a foundation that is poised to impact various applications in intelligent transportation systems and smart city initiatives.