GTA-Crime: Synthetic Violence & Surveillance

Updated 13 September 2025

GTA-Crime Dataset is a synthetic dataset generated using the GTA V engine to simulate both fatal and non-fatal crime events with diverse environmental conditions.
The methodology employs controlled multi-view CCTV simulations with fine-grained annotation to support video anomaly detection and transfer learning.
Snippet-level domain adaptation using Wasserstein adversarial training bridges the gap between synthetic and real surveillance, enhancing violence detection accuracy.

GTA-Crime Dataset refers to a family of synthetic and simulation-based datasets systematically generated using the Grand Theft Auto (GTA) video game environment, targeting the modeling, detection, and prediction of criminal and anomalous events. Such datasets support rigorous paper in video anomaly detection, urban crime modeling, spatiotemporal crime forecasting, and surveillance benchmarking under ethically controlled and realistically variable conditions.

1. Dataset Definition and Purpose

GTA-Crime, as exemplified by the dataset and framework in "GTA-Crime: A Synthetic Dataset and Generation Framework for Fatal Violence Detection with Adversarial Snippet-Level Domain Adaptation" (Kim et al., 10 Sep 2025), is designed to provide rare but critical surveillance content—specifically fatal violence scenarios such as shootings and stabbings—for video anomaly detection (VAD) and rare event modeling. Earlier datasets in the broader GTA-Crime paradigm may also include normal everyday actions, multi-category criminal events, and support spatiotemporal prediction frameworks and simulation-based agent behavior.

Purpose and characteristics:

Enables research in VAD, action recognition, structure-from-motion, and crime forecasting.
Addresses data collection ethics and class imbalance inherent in real violence datasets.
Supports domain adaptation and transfer learning to bridge synthetic–real gaps, validated on real-world benchmarks.

2. Dataset Generation Process and Frameworks

Synthetic Video Generation

The GTA-Crime dataset is constructed by instrumenting the GTA5 engine (via the Rockstar Advanced Game Engine and Scripthook plugins) to simulate and programmatically capture fatal and non-fatal action sequences. Core characteristics (Kim et al., 10 Sep 2025):

Events: Simulated fatal events (shootings, stabbings) and normal actions (walking, idling).
Scenarios: Environmental heterogeneity is induced by varying weather, time of day, and location across ≈75 GTA map regions, both indoor and outdoor.
Multiview CCTV: Every sequence is captured from contrasting camera perspectives (15–20 feet high outdoors, 10–12 feet indoors) for real-world CCTV emulation, always at 30 FPS and 1920×1080 resolution.
Temporal semantics: Each 13-second, 384-frame video has events occurring at controlled temporal windows (e.g., frames 192–288), facilitating precise event boundary annotation.
Fine-grained annotation: The generation pipeline provides explicit frame/event type labels and multiview alignment.

A schematic illustration of the process is as follows (Editor's term):

Step	Controlled Variables	Output
Scenario Gen	Weather, time, location, action	GTA-instrumented scene video
Capture	CCTV view 1, view 2	Multiview video pairs
Annotation	Event type, frame window	Label per frame

Dataset Content

532 total videos: 270 labeled as abnormal (fatal, e.g., shootings/stabbings), 262 as normal.
Diversity is ensured through stochastic configuration sampling (for environmental and behavioral parameters).
Multi-condition dataset is explicitly tailored for rare fatal incidents but can be extended with non-fatal crime/abnormality classes.

3. Snippet-Level Domain Adaptation Methodology

To close the domain gap between synthetic (GTA-Crime) and real-world surveillance datasets (e.g., UCF-Crime), a snippet-level domain adaptation strategy based on Wasserstein adversarial training is employed (Kim et al., 10 Sep 2025):

Feature Adaptor ( $G$ ): Maps synthetic features $X_s$ to the distribution of real data $X_t$ .
Discriminator ( $D$ ): Trained to distinguish $G(X_s)$ from $X_t$ ; loss functions use the WGAN-GP framework:

$L_D = \mathbb{E}[D(X_t)] - \mathbb{E}[D(G(X_s))] + \lambda_{GP} \mathbb{E}_{\hat{x}}[(\|\nabla_{\hat{x}} D(\hat{x})\|_2 - 1)^2]$

$L_F = -\mathbb{E}[D(G(X_s))]$

$\hat{x}$ is an interpolation of source and target, enforcing the Lipschitz constraint.

Class-wise Mapping: Each synthetic event class (stabbing, shooting, normal) is individually aligned with its real-world counterpart to avoid feature collapse and maximize semantic cross-domain consistency.

This method allows model architectures (e.g., feature extractors or VAD networks) trained on GTA-Crime (synthetic) to generalize and improve performance when adapted to real data.

4. Downstream Applications and Empirical Impact

Fatal Violence Detection: The rare event focus of GTA-Crime directly addresses data scarcity for shootings and stabbings, consistently boosting real-world detection accuracy post-adaptation.
Surveillance Benchmarks: The dataset provides multiview CCTV footage under diverse physical conditions for robust algorithm validation.
Domain Adaptation Research: GTA-Crime serves as a synthetic testbed for developing and evaluating advanced domain adaptation strategies such as WGAN-GP alignment.
General Crime Analytics: When extended, GTA-Crime-like datasets can be used in spatiotemporal crime forecasting (Xia et al., 2022, Li et al., 2022, Wu et al., 24 Sep 2024) and agent-based simulation (e.g., CrimeMind (Zeng et al., 6 Jun 2025)).

Experimental results cited in (Kim et al., 10 Sep 2025) show consistent improvements in real-world VAD (e.g., UCF-Crime) when using a model trained on GTA-Crime features adapted via the described adversarial approach.

5. Distinctions, Limitations, and Ethical Context

GTA-Crime is distinguished from prior GTA-based datasets by:

Focusing explicitly on fatal violence (shootings, stabbings).
Providing multiview CCTV-style annotation.
Including a domain adaptation protocol validated in real anomaly detection settings.

Limitations include:

Domain gap remains nonzero without adaptation, as GTA graphics and behaviors do not fully match real-world patterns.
Scenarios are scripted and, while variable, may not capture all naturalistic nuances of spontaneous real-world crime.
Applicability should consider context, as some legal or ethical constraints pertain to synthetic-to-real deployment.

A plausible implication is that advances in synthetic dataset realism and adaptation further reduce dependence on ethically challenging or privacy-sensitive real surveillance video collection.

6. Access, Reproducibility, and Community Adoption

Dataset and code availability: Described tools and data—the video generation framework and adaptation training scripts—are public at [https://github.com/ta-ho/GTA-Crime].
Licensing: While open, usage of GTA-based datasets is subject to Rockstar Games’ terms and modding guidelines.
Community relevance: GTA-Crime is positioned as an extensible platform for synthetic surveillance video research, and has been cited as a significant resource for urban crime VAD and cross-domain anomaly detection.

7. Relationship to Broader GTA-Based Datasets and Research

GTA-Crime builds upon the tradition of leveraging GTA V as a high-fidelity urban simulation platform for data-driven research (Doan et al., 2018, Lei et al., 8 Aug 2025). Related datasets address complementary areas:

G2D (Doan et al., 2018): Trajectory and camera pose capture for image-based tasks (e.g., structure-from-motion).
Attention and behavior annotation (Lei et al., 8 Aug 2025): Player-object interaction data.
Spatiotemporal crime modeling and prediction (Xia et al., 2022, Li et al., 2022, Wu et al., 24 Sep 2024): Analysis of crime event tensors, often requiring synthetic or simulated data for model validation.

In summary, GTA-Crime establishes a reproducible, ethically unobjectionable, and technically rigorous benchmark for fatal violence analysis in video surveillance, while also facilitating research in broader domains of computer vision and urban analytics via simulation-driven approaches.