TransTrack: Multiple Object Tracking with Transformer (2012.15460v2)

Published 31 Dec 2020 in cs.CV

Abstract: In this work, we propose TransTrack, a simple but efficient scheme to solve the multiple object tracking problems. TransTrack leverages the transformer architecture, which is an attention-based query-key mechanism. It applies object features from the previous frame as a query of the current frame and introduces a set of learned object queries to enable detecting new-coming objects. It builds up a novel joint-detection-and-tracking paradigm by accomplishing object detection and object association in a single shot, simplifying complicated multi-step settings in tracking-by-detection methods. On MOT17 and MOT20 benchmark, TransTrack achieves 74.5\% and 64.5\% MOTA, respectively, competitive to the state-of-the-art methods. We expect TransTrack to provide a novel perspective for multiple object tracking. The code is available at: \url{https://github.com/PeizeSun/TransTrack}.

Authors (8)

Peize Sun (33 papers)
Jinkun Cao (25 papers)
Yi Jiang (171 papers)
Rufeng Zhang (9 papers)
Enze Xie (84 papers)
Zehuan Yuan (65 papers)
Changhu Wang (54 papers)
Ping Luo (340 papers)

Citations (519)

View on Semantic Scholar

Summary

The paper introduces a novel joint-detection-and-tracking framework that uses transformer queries for both tracking existing objects and detecting new ones.
By integrating object features from previous frames, TransTrack eliminates the need for a non-maximum suppression stage, simplifying traditional MOT pipelines.
Empirical evaluations on MOT17 and MOT20 benchmarks demonstrate competitive performance with 74.5% and 64.5% MOTA, showcasing practical improvements in object tracking.

An Overview of TransTrack: Multiple Object Tracking with Transformers

The paper "TransTrack: Multiple Object Tracking with Transformer" introduces an innovative approach to the persistent challenge of Multiple Object Tracking (MOT) using the transformer architecture. The paper posits a joint-detection-and-tracking framework leveraging transformers' query-key mechanism to enhance the detection and association processes in MOT tasks.

Methodological Advancements

TransTrack stands out by integrating object features from the previous frame as queries for the current frame and utilizing learned object queries for detecting new objects. This method significantly deviates from typical tracking-by-detection pipelines where detection and re-identification are performed independently, often leading to disjoint processes that fail to mutually enhance one another.

The transformer's robust set prediction capabilities are exploited in TransTrack to efficiently model object associations by passing knowledge across frames. Notably, this approach eliminates the necessity for a non-maximum suppression (NMS) stage, facilitating a simplified, end-to-end trainable model.

Empirical Results

Evaluations on well-regarded MOT benchmarks, namely MOT17 and MOT20, exhibit that TransTrack achieves 74.5% and 64.5% MOTA respectively. These performances are competitive with state-of-the-art models, corroborating the efficacy of incorporating transformer architecture in MOT tasks. The results suggest that the learned object query and track query mechanism effectively address new and existing object detection, indicating a substantial improvement in the reliability of object tracking.

Implications for Object Tracking

TransTrack's methodological approach is notable for its ability to handle the introduction of new objects and the continuity of existing objects without relying solely on traditional detection or re-ID models. The seamless integration of detection and tracking tasks is poised to influence future research and development in object tracking by presenting a unified, transformer-based framework that simplifies the pipeline while enhancing performance.

Future Directions

The application of transformer models in vision tasks is expanding, and TransTrack exemplifies their potential in addressing complex MOT challenges. Future research might focus on optimizing computational efficiency further, exploring advanced attention mechanisms, and integrating more sophisticated temporal modeling. Additionally, expanding the model to support more diverse and dynamic real-world scenarios could propel it as a robust tool in automated surveillance, autonomous vehicles, and related fields.

In conclusion, TransTrack underscores the benefits of merging detection and tracking through transformers, providing a fresh perspective and a compelling baseline for future multi-object tracking advancements.

Related Papers

GitHub

GitHub - PeizeSun/TransTrack: Multiple Object Tracking with Transformer (656 stars)