CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification (1903.09254v4)

Published 21 Mar 2019 in cs.CV

Abstract: Urban traffic optimization using traffic cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale traffic camera dataset consisting of more than 3 hours of synchronized HD videos from 40 cameras across 10 intersections, with the longest distance between two simultaneous cameras being 2.5 km. To the best of our knowledge, CityFlow is the largest-scale dataset in terms of spatial coverage and the number of cameras/videos in an urban environment. The dataset contains more than 200K annotated bounding boxes covering a wide range of scenes, viewing angles, vehicle models, and urban traffic flow conditions. Camera geometry and calibration information are provided to aid spatio-temporal analysis. In addition, a subset of the benchmark is made available for the task of image-based vehicle re-identification (ReID). We conducted an extensive experimental evaluation of baselines/state-of-the-art approaches in MTMC tracking, multi-target single-camera (MTSC) tracking, object detection, and image-based ReID on this dataset, analyzing the impact of different network architectures, loss functions, spatio-temporal models and their combinations on task effectiveness. An evaluation server is launched with the release of our benchmark at the 2019 AI City Challenge (https://www.aicitychallenge.org/) that allows researchers to compare the performance of their newest techniques. We expect this dataset to catalyze research in this field, propel the state-of-the-art forward, and lead to deployed traffic optimization(s) in the real world.

Authors (9)

Zheng Tang (28 papers)
Milind Naphade (9 papers)
Ming-Yu Liu (87 papers)
Xiaodong Yang (101 papers)
Stan Birchfield (64 papers)
Shuo Wang (382 papers)
Ratnesh Kumar (18 papers)
David Anastasiu (3 papers)
Jenq-Neng Hwang (103 papers)

Citations (356)

View on Semantic Scholar

Summary

The paper introduces a city-scale benchmark for multi-camera vehicle tracking with over 3 hours of synchronized HD footage and 200K annotated bounding boxes.
It rigorously evaluates state-of-the-art methods, showing that integrated spatio-temporal reasoning boosts multi-target multi-camera tracking accuracy.
The study highlights challenges in vehicle re-identification with mAP below 35%, providing a crucial open platform for advancing intelligent urban traffic systems.

CityFlow: A Comprehensive Benchmark for Urban-Scale Vehicle Tracking

The paper "CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification" introduces an expansive dataset for advancing research in the domain of urban traffic analysis and optimization. The dataset comprises over three hours of synchronized HD video footage from 40 cameras across 10 intersections, representing a diverse array of urban traffic conditions. This research addresses several critical areas in video analysis, including vehicle tracking across multiple cameras (MTMC tracking), identification of vehicles within single camera views (MTSC tracking), and vehicle re-identification (ReID).

Dataset Composition and Importance

CityFlow stands out as one of the largest datasets in the field, boasting more than 200,000 annotated bounding boxes and covering a broad spectrum of vehicle models, viewing angles, and traffic situations. The provision of camera geometry along with calibration information facilitates precise spatio-temporal analysis, adding a layer of depth rarely available in other datasets. Additionally, a subset dedicated to image-based vehicle re-identification is included, enabling a granular examination of re-identification techniques across varying urban landscapes.

Evaluation of State-of-The-Art Approaches

The paper thoroughly evaluates several state-of-the-art methodologies across tasks outlined above, employing multiple competitive baseline approaches for object detection, single-camera tracking, and image-based ReID. It employs well-known techniques and architectures such as YOLOv3, SSD512, and Faster R-CNN for object detection, coupled with deep tracking algorithms like DeepSORT, TC, and MOANA for MTSC tracking.

For image-based ReID, the paper explores recent advancements in metric learning with various neural network architectures, prominently testing cross-entropy loss and hard triplet loss, both individually and in combination. DenseNet121 emerges as a particularly effective architecture, evidencing the importance of model choice on performance outcomes in such tasks.

Results and Comparative Analysis

Extensive analysis reveals that image-based ReID methods perform significantly better on person re-identification benchmarks than vehicle re-identification, reflecting inherent challenges such as intra-class variability and reduced distinguishing features when vehicles are viewed from different angles. Moreover, MTMC tracking accuracy was notably improved when integrating visual-spatio-temporal reasoning rather than solely relying on visual footprints.

Despite leveraging advanced architectures and loss functions, the best mAP achieved on CityFlow-ReID remains under 35%, underscoring the complexity and challenge posed by this newly introduced benchmark.

Implications and Future Directions

The introduction of CityFlow fills a crucial gap in the research landscape concerning city-scale vehicle tracking and ReID. By providing a rich dataset with substantial spatial coverage and variety, it paves the way for innovations that could significantly improve urban traffic systems. The benchmark's support for comprehensive MTMC tracking research is particularly critical, offering an open platform via an evaluation server for continuous progress tracking.

The implications for traffic optimization are profound, potentially leading to more intelligent systems capable of managing urban flow and disruptions efficiently. The results highlight the necessity for developing robust algorithms that integrate spatio-temporal dynamics with visual data, and future research may well focus on enhancing these integrations to improve performance in real-world settings.

In conclusion, CityFlow offers a robust foundation for the exploration and advancement of tracking and re-identification technologies in large-scale urban environments, promising notable contributions to the field of intelligent transportation systems. This work not only challenges existing state-of-the-art methodologies but also provides the resources needed to propel research efforts toward more practical and high-impact applications.