Towards Grand Unification of Object Tracking (2207.07078v4)

Published 14 Jul 2022 in cs.CV

Abstract: We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. Due to the fragmented definitions of the object tracking problem itself, most existing trackers are developed to address a single or part of tasks and overspecialize on the characteristics of specific tasks. By contrast, Unicorn provides a unified solution, adopting the same input, backbone, embedding, and head across all tracking tasks. For the first time, we accomplish the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. We believe that Unicorn will serve as a solid step towards the general vision model. Code is available at https://github.com/MasterBin-IIAU/Unicorn.

Authors (7)

Bin Yan (138 papers)
Yi Jiang (171 papers)
Peize Sun (33 papers)
Dong Wang (628 papers)
Zehuan Yuan (65 papers)
Ping Luo (340 papers)
Huchuan Lu (199 papers)

Citations (120)

View on Semantic Scholar

Summary

The paper presents Unicorn, a unified model that integrates SOT, MOT, VOS, and MOTS using a single network architecture.
Its innovative design, featuring target prior and pixel-wise correspondence, achieves strong performance across 8 diverse tracking datasets.
The unified approach improves computational efficiency and simplifies deployment, paving the way for more versatile and resource-efficient tracking systems.

Towards Grand Unification of Object Tracking

The paper "Towards Grand Unification of Object Tracking" presents a pioneering approach named Unicorn, which aims to integrate four different object tracking tasks—Single Object Tracking (SOT), Multiple Object Tracking (MOT), Video Object Segmentation (VOS), and Multi-Object Tracking and Segmentation (MOTS)—into a single unified model using identical network parameters. This endeavor addresses a significant issue in the field of computer vision where specialized models have limited generalization capabilities and parameter redundancy due to fragmented approaches tailored to specific tasks.

Key Contributions

Unified Model Design: Unicorn employs a consistent network architecture, referred to as the Grand Unification of Object Tracking, which utilizes the same input format, backbone, embedding, and detection head across all tasks. This design choice marks a key advancement in building adaptable models that function seamlessly across varied object tracking scenarios.
Core Design Innovations:
- Target Prior: An innovative input for the detection head that enables task-specific adaptations. For SOT and VOS, the target prior is the propagated reference target map. For MOT and MOTS, the target prior degenerates to zero, facilitating classic class-specific detection.
- Pixel-Wise Correspondence: A generalized correspondence mechanism across frames that helps in maintaining accurate tracking data, which is crucial given the distinct requirements of SOT and MOT.
Performance Evaluation: Unicorn is proven to achieve comparable or superior performance against state-of-the-art task-specific methods on 8 tracking datasets, including LaSOT, TrackingNet, and BDD100K. For instance, on LaSOT, Unicorn attains a new high in Success and Precision metrics, substantially outperforming global-detection-based trackers.

Implications and Future Directions

The unification approach proposed in Unicorn calls for a paradigm shift in object tracking from specialized models to versatile, multi-task capable architectures. This has far-reaching implications both in theory and practice:

Parameter Efficiency: Unified models reduce the need for separate networks, minimizing computational redundancy and storage requirements.
Generalization: By leveraging a shared learning framework, Unicorn likely improves the model's capacity to transfer knowledge across tasks, a step towards more general vision intelligence systems.
Simplified Pipeline: The consolidation into a single network architecture may simplify the deployment of tracking systems in real-world applications, reducing the complexity of managing multiple specialized systems.

Future developments could explore enhancing the scalability of such unified models to encompass an even broader spectrum of vision tasks, further tightening the gap towards AGI. Ongoing research could focus on refining the balance between performance across different tracking tasks and exploring more nuanced use cases within the tracking domain. Moreover, integrating additional modalities or incorporating real-time adaptability could be pivotal in advancing the model's applicability across diverse environments.

The proposition of Unicorn marks a noteworthy attempt in unifying object tracking methodologies, fostering not only academic interest but also offering pragmatic solutions that could transition seamlessly into industrial applications. By examining the potential integration with other AI disciplines, the foundation laid by this work could potentially spur developments across broader domains in AI, advancing towards more cohesive and resource-efficient solutions.

Related Papers

GitHub

GitHub - MasterBin-IIAU/Unicorn: [ECCV'22 Oral] Towards Grand Unification of Object Tracking (950 stars)

Tweets

https://twitter.com/_akhaliq/status/1548268785357897729

https://twitter.com/bogdan_ivanyuk/status/1551606231592222722

https://twitter.com/BinYanMaster/status/1565264819489349632

YouTube

Show All Videos