Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification (1809.04427v1)

Published 12 Sep 2018 in cs.CV

Abstract: Online multi-object tracking is a fundamental problem in time-critical video analysis applications. A major challenge in the popular tracking-by-detection framework is how to associate unreliable detection results with existing tracks. In this paper, we propose to handle unreliable detection by collecting candidates from outputs of both detection and tracking. The intuition behind generating redundant candidates is that detection and tracks can complement each other in different scenarios. Detection results of high confidence prevent tracking drifts in the long term, and predictions of tracks can handle noisy detection caused by occlusion. In order to apply optimal selection from a considerable amount of candidates in real-time, we present a novel scoring function based on a fully convolutional neural network, that shares most computations on the entire image. Moreover, we adopt a deeply learned appearance representation, which is trained on large-scale person re-identification datasets, to improve the identification ability of our tracker. Extensive experiments show that our tracker achieves real-time and state-of-the-art performance on a widely used people tracking benchmark.

Authors (4)

Long Chen (395 papers)
Haizhou Ai (6 papers)
Zijie Zhuang (23 papers)
Chong Shang (4 papers)

Citations (359)

View on Semantic Scholar

Summary

The paper introduces an innovative framework that selects candidates from both detections and tracks using a unified scoring function.
It employs a hierarchical data association strategy by leveraging deeply learned ReID features to enhance accurate identity matching.
Extensive evaluations on the MOT16 dataset confirm the framework's real-time performance and improved tracking accuracy over conventional methods.

Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-Identification

This paper addresses a critical challenge in the field of online multi-object tracking, specifically focusing on the task of tracking multiple individuals in complex, real-time environments. The proposed framework leverages advancements in deep learning, particularly convolutional neural networks (CNNs), to improve candidate selection and person re-identification (ReID) within the tracking-by-detection paradigm.

Key Contributions

The paper presents several innovative methods that together form a comprehensive solution to the challenges inherent in online multi-object tracking:

Candidate Selection from Both Detection and Tracking Outputs: By generating candidates from both detection and tracks in each frame, the method effectively addresses the reliability issues of detections and the limitations of traditional tracking frameworks. The motivation is to take advantage of complementary strengths of detection and tracking under different conditions.
Unified Scoring Function: A novel scoring function, leveraging a fully convolutional network (FCN), is proposed to efficiently select from a large pool of candidates. This approach minimizes computational redundancy by sharing calculations on the entire image, enabling real-time processing. The scoring function incorporates both classification confidence and a tracklet confidence measure.
Hierarchical Data Association with ReID Features: To enhance identity matching and reduce ambiguities due to similar appearances or occlusion, the paper proposes a hierarchical approach to data association using deeply learned appearance features. These ReID features are trained on large-scale datasets, demonstrating superior performance over conventional hand-crafted features like color histograms or HOG descriptors in maintaining identity integrity.

Experimental Evaluation

The research includes extensive evaluations on the MOT16 dataset, a benchmark known for its challenging conditions. The results demonstrate that the proposed method achieves superior performance in terms of multiple object tracking accuracy (MOTA), identity F1 score (IDF1), and identity recalls, outperforming existing techniques, particularly in the online tracking domain. Notably, the framework operates at real-time speeds, a significant achievement given the computational complexities involved in handling video data and deep learning-based models.

Implications and Future Directions

The approach offers significant potential for practical implementations in real-time video surveillance, autonomous driving, and sports analytics, where quick and reliable tracking of multiple individuals is essential. The integration of ReID features introduces a robust method for handling occlusions and tracking individuals over extended periods. Future research could focus on further optimizing computational efficiency by potentially integrating shared convolutional layers for both classification and ReID purposes, facilitating the deployment of such systems on more resource-constrained platforms.

Overall, this work represents a substantial contribution to the domain of online multi-object tracking, providing both practical tools and a framework that future research can build upon. The utilization of deep learning paradigms in solving traditional computer vision tasks, like multi-people tracking, is particularly noteworthy and paves the way for further innovations.

PDF Markdown