Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking (2308.00783v2)

Published 1 Aug 2023 in cs.CV

Abstract: Multi-Object Tracking (MOT) aims to detect and associate all desired objects across frames. Most methods accomplish the task by explicitly or implicitly leveraging strong cues (i.e., spatial and appearance information), which exhibit powerful instance-level discrimination. However, when object occlusion and clustering occur, spatial and appearance information will become ambiguous simultaneously due to the high overlap among objects. In this paper, we demonstrate this long-standing challenge in MOT can be efficiently and effectively resolved by incorporating weak cues to compensate for strong cues. Along with velocity direction, we introduce the confidence and height state as potential weak cues. With superior performance, our method still maintains Simple, Online and Real-Time (SORT) characteristics. Also, our method shows strong generalization for diverse trackers and scenarios in a plug-and-play and training-free manner. Significant and consistent improvements are observed when applying our method to 5 different representative trackers. Further, with both strong and weak cues, our method Hybrid-SORT achieves superior performance on diverse benchmarks, including MOT17, MOT20, and especially DanceTrack where interaction and severe occlusion frequently happen with complex motions. The code and models are available at https://github.com/ymzis69/HybridSORT.

Citations (27)

View on Semantic Scholar

Summary

The paper introduces Hybrid-SORT, which combines underutilized weak cues with traditional signals to improve multi-object tracking.
It employs Tracklet Confidence Modeling with Kalman Filters and a Height Modulated IoU to enhance object association under occlusion.
Empirical results demonstrate a 7.6 HOTA improvement on DanceTrack and robust performance across various MOT benchmarks.

Overview of Hybrid-SORT: Advancements in Multi-Object Tracking

The task of Multi-Object Tracking (MOT) fundamentally involves detecting and associating multiple objects across sequential frames. Recent advancements leveraging strong cues like spatial and appearance information have significantly contributed to instance-level discrimination. However, challenges such as object occlusion and clustering persist, leading to a degradation in the efficacy of these strong cues due to high overlaps among objects. This paper introduces a novel approach, Hybrid-SORT, which augments traditional strong cues with weak cues, providing an innovative solution to the enduring issues within MOT.

Concept and Methodology

Hybrid-SORT elegantly integrates weak cues—namely velocity direction, confidence state, and height state—as compensatory mechanisms to bolster the traditional strong cues. The confidence state and height state, traditionally underutilized, are posited as critical weak cues that offer robust improvements in accuracy and reliability for object tracking during occlusion and clustering events. These weak cues effectively distinguish between highly overlapped objects by indicating occlusion relationships and providing depth cues through height state analysis.

Two primary methodologies are introduced within the Hybrid-SORT framework:

Tracklet Confidence Modeling (TCM): This employs Kalman Filters to estimate the stability and reliability of object track confidence, which is crucial in occlusion-heavy environments. An auxiliary Linear Prediction model supplements this by providing rapid adjustments to confidence states, thus ensuring accurate tracking despite occlusions.
Height Modulated IoU (HMIoU): This novel form of Intersection over Union (IoU) calculation integrates height information to enhance accuracy in object association tasks. The fusion of height and conventional IoU offers superior discernment in distinguishing overlapping objects.

Furthermore, Hybrid-SORT refines the Observation-Centric Momentum (OCM) model, expanding its application to multiple temporal intervals and object corners, thus providing a more robust model of velocity direction.

Empirical Evaluation and Implications

The application of Hybrid-SORT across diverse benchmarks, such as MOT17, MOT20, and the complex DanceTrack dataset, demonstrates its efficacy and adaptability. On DanceTrack, Hybrid-SORT achieved a performance improvement of 7.6 HOTA over its predecessors, cementing its capability to handle scenarios involving severe occlusions and interactions between objects.

The strong numerical improvements observed across multiple trackers, including SORT, DeepSORT, and ByteTrack, underscore the versatility and generalizability of Hybrid-SORT's plug-and-play approach. This seamless adaptability is evidenced by consistent benchmarking improvements realized without necessitating additional training, reinforcing the practicability of the approach in varied real-time applications.

Future Prospects and Theoretical Implications

Hybrid-SORT's integration of traditionally weak cues into the fabric of MOT methodologies paves the way for future research that might explore underexplored tracking cues or other auxiliary data overlays. The balance between strong and weak cues encapsulated in Hybrid-SORT sets a precedent for achieving superior real-time performance while minimizing computational overhead, a critical consideration for deployment in resource-limited environments such as autonomous vehicles or mobile devices.

In conclusion, Hybrid-SORT offers a significant contribution to the field by effectively addressing the persistent challenges in MOT with innovative weak signal integrations and robust modeling techniques. These advancements hold considerable implications for both theoretical explorations and practical applications in intelligent tracking systems.

PDF Markdown

Related Papers

GitHub

GitHub - ymzis69/HybridSORT: [AAAI2024]Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking (199 stars)