Learning a Neural Solver for Multiple Object Tracking (1912.07515v2)

Published 16 Dec 2019 in cs.CV

Abstract: Graphs offer a natural way to formulate Multiple Object Tracking (MOT) within the tracking-by-detection paradigm. However, they also introduce a major challenge for learning methods, as defining a model that can operate on such \textit{structured domain} is not trivial. As a consequence, most learning-based work has been devoted to learning better features for MOT, and then using these with well-established optimization frameworks. In this work, we exploit the classical network flow formulation of MOT to define a fully differentiable framework based on Message Passing Networks (MPNs). By operating directly on the graph domain, our method can reason globally over an entire set of detections and predict final solutions. Hence, we show that learning in MOT does not need to be restricted to feature extraction, but it can also be applied to the data association step. We show a significant improvement in both MOTA and IDF1 on three publicly available benchmarks. Our code is available at https://bit.ly/motsolv .

Authors (2)

Guillem Brasó (11 papers)
Laura Leal-Taixé (74 papers)

Citations (374)

View on Semantic Scholar

Summary

Learning a Neural Solver for Multiple Object Tracking

The paper "Learning a Neural Solver for Multiple Object Tracking" presents a novel approach to the Multiple Object Tracking (MOT) problem through the integration of learning directly into the graph data association task. Expanding on the traditional tracking-by-detection paradigm, the authors integrate a Message Passing Network (MPN) into the classical network flow formulation to enable end-to-end learning and prediction of tracking associations.

Methodology Overview

The method introduced by the authors leverages the natural graph representation of MOT, where nodes correspond to object detections and edges denote potential associations between them across frames. By employing this structured domain, the paper proposes a fully differentiable framework using MPNs. This approach does not merely focus on improving feature extraction—historically a common emphasis in learning-based MOT—but instead encapsulates graph partitioning through a learning-based solver that can directly predict graph partitions, or trajectories, from input data.

The authors propose a unique message passing methodology within the graph network. The key innovation lies in a time-aware neural message passing update step that preserves high-order dependencies by distinguishing interactions based on temporal information. This method deviates from conventional pairwise cost learning, operating globally and holistically over a set of detections, thereby capturing complex interdependencies within the graph structure.

Numerical Results

The application of this framework is demonstrated through significant improvements in state-of-the-art metrics such as the Multiple Object Tracking Accuracy (MOTA) and ID F1 Score (IDF1) across several MOT benchmarks, namely 2D MOT 2015, MOT16, and MOT17. For instance, a 6-11 percentage point increase in IDF1 was observed, indicating enhanced identity preservation capabilities—a crucial metric for evaluating tracking performance. The experiments also indicate more efficient data processing, with the proposed method running up to one order of magnitude faster than traditional optimization-based methods.

Implications and Future Directions

The research underscores the potential of learning-based graph solvers in the field of MOT by presenting a framework that transcends conventional feature extraction and integrates learning into the overall tracking solution process. The proposed methodology suggests an evolution in perspective towards treating data association as a classification problem solvable via deep learning techniques.

Given the significant advancements shown in identity tracking and computational efficiency, adapting this neural solver approach could have practical applications in various domains such as autonomous driving, video surveillance, and sports analytics. Future work could explore further enhancements in graph message passing networks, potentially improving scalability and expanding adaptability to more diverse object tracking scenarios.

In conclusion, by offering a neural solver ambient to the structured nature of graph-based MOT, this paper lays a foundational step toward more sophisticated, scalable solutions departing from traditional localized feature improvements, thus broadening the scope and impact of deep learning applications in object tracking.

PDF Markdown

Related Papers

YouTube

Show All Videos