Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking (2407.10151v2)

Published 14 Jul 2024 in cs.CV

Abstract: Multi-object tracking (MOT) endeavors to precisely estimate the positions and identities of multiple objects over time. The prevailing approach, tracking-by-detection (TbD), first detects objects and then links detections, resulting in a simple yet effective method. However, contemporary detectors may occasionally miss some objects in certain frames, causing trackers to cease tracking prematurely. To tackle this issue, we propose BUSCA, meaning `to search', a versatile framework compatible with any online TbD system, enhancing its ability to persistently track those objects missed by the detector, primarily due to occlusions. Remarkably, this is accomplished without modifying past tracking results or accessing future frames, i.e., in a fully online manner. BUSCA generates proposals based on neighboring tracks, motion, and learned tokens. Utilizing a decision Transformer that integrates multimodal visual and spatiotemporal information, it addresses the object-proposal association as a multi-choice question-answering task. BUSCA is trained independently of the underlying tracker, solely on synthetic data, without requiring fine-tuning. Through BUSCA, we showcase consistent performance enhancements across five different trackers and establish a new state-of-the-art baseline across three different benchmarks. Code available at: https://github.com/lorenzovaquero/BUSCA.

Summary

The paper proposes BUSCA, a novel online framework that recovers missed detections in tracking-by-detection systems.
It integrates decision transformers and spatiotemporal encoding to generate robust object proposals from motion predictions and contextual cues.
BUSCA consistently improves benchmarks like MOT16, MOT17, and MOT20 by enhancing tracking continuity without using future frame data.

Analyzing "Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking"

This paper addresses a critical challenge in the domain of online multi-object tracking (MOT): the failure of detectors to consistently identify objects across different frames, especially in situations involving occlusions. The paper proposes a novel framework called BUSCA, designed to complement existing tracking-by-detection (TbD) systems by persistently tracking objects that have been missed by detectors.

Context and Motivation

The prevailing paradigm in MOT is the tracking-by-detection (TbD) approach. This method involves detecting objects in individual frames and then linking these detections across frames to form object trajectories. Despite its effectiveness, the TbD method is limited by its dependency on the initial detection accuracy. Missed detections, often caused by occlusions, lead to premature termination of tracks, fragmenting the object's trajectory.

Proposed Framework: BUSCA

The authors introduce BUSCA (meaning 'to search'), which integrates into any existing online TbD system. BUSCA operates in a fully online manner, meaning it processes each frame as it comes without altering past results or requiring future frames. At its core, BUSCA generates object proposals using neighboring track information, motion predictions, and learned task-specific tokens. The framework employs a decision Transformer that merges visual and spatiotemporal data to address object-proposal associations, treated as a multi-choice question-answer scenario.

Key Features of BUSCA:

Decision Transformer: Handles the association task by attending to candidates generated independently of the detector. It uses a holistic approach combining appearance and spatiotemporal inputs.
Spatiotemporal Encoding: Encapsulates time, size, and distance features in a novel encoding scheme, enhancing the ability to interpret complex relationship dynamics.
Proposal Generation: Efficiently generates candidate proposals from motion models, contextual scene information, and learned tokens, improving the likelihood of maintaining a correct track over time.

Results and Implications

BUSCA demonstrates consistent improvements across five different tracker implementations on standard benchmarks such as MOT16, MOT17, and MOT20. It establishes new performance baselines, showing notable gains in metrics like Multi-Object Tracking Accuracy (MOTA) and Higher Order Tracking Accuracy (HOTA).

These findings suggest two major implications:

Enhanced Trajectory Continuity: By reducing premature track termination, BUSCA improves trajectory continuity without access to future frames, critical for real-time applications like autonomous driving and video surveillance.
Deployment Flexibility: Given its general framework, BUSCA can be integrated with various trackers and does not require specialized fine-tuning or retraining, making it a versatile tool for improving MOT systems.

Future Directions

The paper opens avenues for further exploration in enhancing online tracking systems. Potential future directions might include integrating 3D multimodal cues to improve the robustness of object tracking in dynamic and cluttered environments. Additionally, the framework could be adapted to refine past tracking predictions and correct erroneous associations retrospectively, which could significantly enhance real-world application efficacy.

Overall, BUSCA represents a significant step in addressing the limitations inherent in TbD systems, particularly under challenging conditions, and offers a promising tool for advancing MOT technologies.

PDF Markdown

Related Papers

GitHub

GitHub - lorenzovaquero/BUSCA: Official code implementation for "Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking" (ECCV 2024) (31 stars)

YouTube

Show All Videos