How To Train Your Deep Multi-Object Tracker (1906.06618v3)

Published 15 Jun 2019 in cs.CV

Abstract: The recent trend in vision-based multi-object tracking (MOT) is heading towards leveraging the representational power of deep learning to jointly learn to detect and track objects. However, existing methods train only certain sub-modules using loss functions that often do not correlate with established tracking evaluation measures such as Multi-Object Tracking Accuracy (MOTA) and Precision (MOTP). As these measures are not differentiable, the choice of appropriate loss functions for end-to-end training of multi-object tracking methods is still an open research problem. In this paper, we bridge this gap by proposing a differentiable proxy of MOTA and MOTP, which we combine in a loss function suitable for end-to-end training of deep multi-object trackers. As a key ingredient, we propose a Deep Hungarian Net (DHN) module that approximates the Hungarian matching algorithm. DHN allows estimating the correspondence between object tracks and ground truth objects to compute differentiable proxies of MOTA and MOTP, which are in turn used to optimize deep trackers directly. We experimentally demonstrate that the proposed differentiable framework improves the performance of existing multi-object trackers, and we establish a new state of the art on the MOTChallenge benchmark. Our code is publicly available from https://github.com/yihongXU/deepMOT.

Citations (184)

View on Semantic Scholar

Summary

The paper’s main contribution is the introduction of differentiable loss functions that directly link predicted assignments to standard metrics like MOTA and MOTP.
It employs the Deep Hungarian Net (DHN), a bi-directional recurrent network that provides a soft approximation of optimal object-to-track assignments.
The framework enables end-to-end training by integrating detection, association, and appearance modeling, leading to improved tracking accuracy and reduced ID switches.

How To Train Your Deep Multi-Object Tracker

The paper "How To Train Your Deep Multi-Object Tracker" contributes to the field of vision-based Multi-Object Tracking (MOT) by proposing a differentiable framework for the end-to-end training of deep multi-object trackers. The authors address the prevalent issue in MOT of decoupling the learning of detection and association tasks, proposing a novel loss function that directly correlates with established MOT evaluation metrics, namely Multi-Object Tracking Accuracy (MOTA) and Precision (MOTP).

Core Contributions

The research introduces the Deep Hungarian Net (DHN), a pivotal component that approximates the Hungarian algorithm, commonly used for solving object-to-track assignment problems. The DHN outputs a soft assignment matrix, enabling the use of traditional CLEAR-MOT metrics in a differentiable manner. By integrating this into a loss framework, the method allows for the back-propagation of gradients, thus enabling the direct optimization of the tracking metrics.

Differentiable Tracker Loss:
- The paper proposes novel loss functions inspired by the CLEAR-MOT evaluation metrics. MOTA and MOTP are expressed as differentiable functions of predicted assignment and distance matrices.
- The loss function is composed of differentiable proxies of True Positives (TP), False Positives (FP), False Negatives (FN), and ID Switches (IDS), ensuring that optimization aligns with standard metrics.
Deep Hungarian Net (DHN):
- DHN uses a bi-directional recurrent neural network to compute a soft approximation of the optimal prediction-to-ground-truth assignment.
- By learning global assignments rather than relying solely on non-differentiable matching algorithms, DHN provides a bridge for gradient-based optimizations.
End-to-End Training Framework:
- The authors extend Tracktor by training it in an end-to-end manner, incorporating a new ReID head alongside DHN for appearance modeling, improving ID consistency.

Experimental Results

The paper's experimental section supports the efficacy of the proposed method:

Evaluations conducted on the MOTChallenge benchmarks (MOT15, MOT16, and MOT17) demonstrate improvements in MOTA and IDF1 scores using their framework.
Comparisons with state-of-the-art trackers indicate the framework's capability to surpass or match the performance of existing approaches while reducing IDS and improving bounding box precision.
The combined approach of the DHN and the tailored loss function shows compelling improvements over the baseline versions of the trackers.

Implications and Future Directions

This work advances MOT by directly integrating evaluation metrics into the loss function, thereby facilitating a more coherent optimization pathway for tracker training. By achieving superior alignment with practical metrics, the proposed framework showcases improvements in tracking accuracy and identification performance.

Looking forward, this approach encourages exploration into further integrating differentiable components for aspects like appearance modeling and durability in complex environments. One may speculate on developments in integrating similar approaches into real-time applications, such as autonomous vehicles and surveillance systems, enhancing both the robustness and reactivity of these systems.

The paper represents an important step in simplifying and unifying the training paradigms for MOT, dismantling traditional silos of detection and tracking, and providing a clearer path towards comprehensive learning strategies. This approach may well set the groundwork for novel exploratory avenues in not just tracking but also in other vision tasks requiring intricate global matching solutions.

PDF Markdown

Related Papers

GitHub

GitHub - yihongXU/deepMOT: Official Implementation of How To Train Your Deep Multi-Object Tracker (CVPR2020) (493 stars)

Tweets

https://twitter.com/shahabks/status/1209688314585812992

YouTube

Show All Videos