Tracking Objects as Points (2004.01177v2)

Published 2 Apr 2020 in cs.CV

Abstract: Tracking has traditionally been the art of following interest points through space and time. This changed with the rise of powerful deep networks. Nowadays, tracking is dominated by pipelines that perform object detection followed by temporal association, also known as tracking-by-detection. In this paper, we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. Given this minimal input, CenterTrack localizes objects and predicts their associations with the previous frame. That's it. CenterTrack is simple, online (no peeking into the future), and real-time. It achieves 67.3% MOTA on the MOT17 challenge at 22 FPS and 89.4% MOTA on the KITTI tracking benchmark at 15 FPS, setting a new state of the art on both datasets. CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes. Using monocular video input, it achieves 28.3% [email protected] on the newly released nuScenes 3D tracking benchmark, substantially outperforming the monocular baseline on this benchmark while running at 28 FPS.

Authors (3)

Xingyi Zhou (26 papers)
Vladlen Koltun (114 papers)
Philipp Krähenbühl (55 papers)

Citations (959)

View on Semantic Scholar

Summary

The paper introduces CenterTrack, a framework that unifies detection and tracking by representing objects as center points.
It leverages consecutive frames and heatmaps to predict 2D offsets, enabling a simple greedy algorithm for efficient frame-to-frame association.
The approach achieves state-of-the-art performance with 67.8% MOTA on MOT17 and 89.4% MOTA on KITTI, opening avenues for 3D tracking and multi-modal extensions.

Tracking Objects as Points

The paper "Tracking Objects as Points" introduces CenterTrack, an innovative approach for multi-object tracking (MOT) that unifies detection and tracking into a single, streamlined pipline. This novel method brings forth simplicity, speed, and impressive accuracy through a new representation of tracking tasks: tracking objects by their center points.

Overview

Traditional tracking methods have evolved significantly, transitioning from following interest points through space and time to the dominant pipeline of tracking-by-detection. This paradigm detects objects in individual frames and associates detections across frames. While effective, it often involves complex and computationally expensive association strategies. CenterTrack circumvents these complexities by introducing a simultaneous detection and tracking approach, where a detection model processes pairs of consecutive frames and a heatmap of prior detections to predict object locations and associations in an end-to-end differentiable manner.

Methodology

CenterTrack leverages the CenterNet object detector, conditioning it on two consecutive frames and a heatmap representing objects detected in previous frames. Each object is represented as a single point at the center of its bounding box, and the model predicts offsets for these points to establish frame-to-frame associations. The components of the tracking pipeline are significantly simplified:

Tracking-Conditioned Detection: By integrating previous frame detections into the input, the model reasons about both frame contents simultaneously, enhancing temporal coherence.
Offset Prediction: CenterTrack predicts a 2D displacement for each detected object center, allowing simple and efficient association through a greedy algorithm based on these offsets.

This approach achieves real-time performance and impressive accuracy scores without overcomplicating the tracking pipeline. Specifically, CenterTrack operates with 67.8% MOTA at 22 FPS on the MOT17 challenge and 89.4% MOTA at 15 FPS on the KITTI tracking benchmark, establishing new state-of-the-art results on both datasets.

Numerical Results

The tracker's performance is noteworthy across multiple datasets:

MOT17: CenterTrack achieves 67.8% MOTA at 22 FPS, with significant improvements over prior methods such as Tracktor v2 (~11.3% relative improvement).
KITTI: The model records 89.4% MOTA, setting a new benchmark with superior accuracy (82.31% MT ratio) and efficiency (82ms per frame).
nuScenes 3D Tracking: CenterTrack extends to monocular 3D tracking, achieving 28.3% [email protected] with notable improvements over the monocular baseline.

Implications and Future Work

CenterTrack's capabilities provide both practical and theoretical advancements for the field of computer vision. By simplifying key components of the tracking process and maintaining high performance, this method is highly conducive for real-time applications such as autonomous driving and surveillance. The tracker’s capacity to handle both 2D and 3D tracking tasks further amplifies its versatility.

The integration of prior detections via heatmaps markedly reduces false negatives and identity switches, evidencing the efficacy of such tracking-conditioned detection strategies. The paper also highlights the advantage of learned offset predictions over traditional motion models, especially in scenarios with significant inter-frame motion – as exemplified in the nuScenes dataset results.

Looking forward, the principles demonstrated by CenterTrack open potential avenues for further exploration:

Enhancing Long-Range Tracking: Combining the simplicity of CenterTrack with more sophisticated, long-range matching and reidentification techniques could enhance tracking robustness over long temporal gaps.
Multi-Modal Extensions: Extending the framework to incorporate other sensory inputs, such as LiDAR or radar, could improve performance in diverse and challenging environments.
Optimization for Low-Power Devices: Investigating further optimizations to CenterTrack for deployment on edge devices can widen its applicability in resource-constrained settings.

Conclusion

The paper presents CenterTrack as a significant stride towards efficient and effective object tracking by unifying detection and tracking processes. Its performance underscores the potential of tracking-point-based frameworks to simplify and enhance the tracking paradigm, marking a notable contribution to the field of multi-object tracking.

PDF Markdown