Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification (2302.11813v1)

Published 23 Feb 2023 in cs.CV

Abstract: Motion-based association for Multi-Object Tracking (MOT) has recently re-achieved prominence with the rise of powerful object detectors. Despite this, little work has been done to incorporate appearance cues beyond simple heuristic models that lack robustness to feature degradation. In this paper, we propose a novel way to leverage objects' appearances to adaptively integrate appearance matching into existing high-performance motion-based methods. Building upon the pure motion-based method OC-SORT, we achieve 1st place on MOT20 and 2nd place on MOT17 with 63.9 and 64.9 HOTA, respectively. We also achieve 61.3 HOTA on the challenging DanceTrack benchmark as a new state-of-the-art even compared to more heavily-designed methods. The code and models are available at \url{https://github.com/GerardMaggiolino/Deep-OC-SORT}.

Citations (120)

View on Semantic Scholar

Summary

The paper presents an adaptive re-identification method integrated with motion cues to improve tracking accuracy.
It introduces Camera Motion Compensation, Dynamic Appearance, and Adaptive Weighting modules to handle occlusions and visual ambiguities.
Experiments on MOT17, MOT20, and DanceTrack benchmarks demonstrate its state-of-the-art performance with a HOTA score of 61.3.

Overview of Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification

The paper "Deep OC-SORT: Multi-Pedestrian Tracking by Adaptive Re-Identification," authored by Gerard Maggiolino, Adnan Ahmad, Jinkun Cao, and Kris Kitani, presents an innovative approach to multi-object tracking (MOT) leveraging adaptive re-identification techniques. The paper builds upon the existing OC-SORT framework, a motion-based tracking algorithm that previously excelled in tracking efficacy. This work addresses the integration of visual appearance cues to enhance object tracking robustness, particularly in scenarios complicated by occlusions, motion blur, or similarly appearing objects.

Methodology and Key Contributions

The authors introduce a method that adaptively integrates appearance matching into motion-based tracking algorithms. The main contributions of the paper are encapsulated in the introduction of three modules: Camera Motion Compensation (CMC), Dynamic Appearance (DA), and Adaptive Weighting (AW), refining the previously established OC-SORT model. These modules aim to improve the robustness and accuracy of tracking by combining visual and motion-based cues.

The CMC module contributes by adjusting the position of tracked objects to compensate for scene-wide camera motions. This adjustment is particularly beneficial in dynamic environments where the camera and objects move independently, thereby maintaining stable tracking accuracy.

Dynamic Appearance (DA) aims to incorporate visual data selectively by adjusting the influence of the new visual appearance embedding into an existing tracklet model based on the confidence of the detection. This practice reduces the likelihood of integrating corrupted data when tracking is challenged by visual degradations.

The Adaptive Weighting module enhances the utility of appearance embeddings by dynamically adjusting their weight in the multi-object tracking association problem. This module discriminates the efficacy of appearance embeddings selectively, which is pivotal during associations between detections and tracked objects.

Results and Benchmarks

Empirical evaluations were conducted across multiple well-regarded MOT benchmarks, such as MOT17, MOT20, and DanceTrack. The method demonstrated significant improvements over contemporary models, achieving 1st place on the MOT20 and establishing a new state-of-the-art on the DanceTrack dataset with a HOTA score of 61.3. These results exemplify the method's robustness, especially in scenarios with more challenging occlusions and visual ambiguities.

Discussion and Future Implications

The integration of appearance cues into motion-based object tracking models presents meaningful advancements in the MOT field. The dynamic and adaptive integration approach facilitated by the Deep OC-SORT model shows potential for extending to more complex detection systems in the future. Practical implementations, particularly those involving moving cameras and densely populated scenes, may benefit substantially from these innovations in tracking methodologies.

Theoretical implications focus on the efficiency of combining appearance and motion data without overly complicating the fusion process. Future research directions could explore the incorporation of these frameworks into broader multi-sensor data fusion systems or their applicability in real-time environments within autonomous navigation and surveillance systems.

This paper provides a solid baseline for researchers interested in enhancing MOT systems, specifically regarding the balance between computational cost and tracking robustness. The availability of both the code and models on GitHub facilitates further exploration and potential improvements by the research community.

PDF Markdown

Related Papers

GitHub

GitHub - GerardMaggiolino/Deep-OC-SORT: https://arxiv.org/abs/2302.11813 (191 stars)