Learning a Proposal Classifier for Multiple Object Tracking (2103.07889v3)

Published 14 Mar 2021 in cs.CV

Abstract: The recent trend in multiple object tracking (MOT) is heading towards leveraging deep learning to boost the tracking performance. However, it is not trivial to solve the data-association problem in an end-to-end fashion. In this paper, we propose a novel proposal-based learnable framework, which models MOT as a proposal generation, proposal scoring and trajectory inference paradigm on an affinity graph. This framework is similar to the two-stage object detector Faster RCNN, and can solve the MOT problem in a data-driven way. For proposal generation, we propose an iterative graph clustering method to reduce the computational cost while maintaining the quality of the generated proposals. For proposal scoring, we deploy a trainable graph-convolutional-network (GCN) to learn the structural patterns of the generated proposals and rank them according to the estimated quality scores. For trajectory inference, a simple deoverlapping strategy is adopted to generate tracking output while complying with the constraints that no detection can be assigned to more than one track. We experimentally demonstrate that the proposed method achieves a clear performance improvement in both MOTA and IDF1 with respect to previous state-of-the-art on two public benchmarks. Our code is available at https://github.com/daip13/LPC_MOT.git.

Citations (86)

View on Semantic Scholar

Summary

The paper introduces a proposal-based framework leveraging iterative graph clustering and a trainable GCN to enhance tracking performance.
It breaks down multiple object tracking into proposal generation, scoring, and trajectory inference to streamline data association.
Empirical results on MOT17 and MOT20 benchmarks show improved MOTA and a 1.2% boost in IDF1, underlining its effectiveness.

Learning a Proposal Classifier for Multiple Object Tracking

The paper "Learning a Proposal Classifier for Multiple Object Tracking" proposes an innovative framework to address the problem of multiple object tracking (MOT) by leveraging a proposal-based methodology akin to the Faster RCNN, a well-known two-stage object detection framework. This method delineates the MOT process into three primary stages: proposal generation, proposal scoring, and trajectory inference.

Methodology Summary

The key novelty of the proposed approach lies in its graph-based representation for tracklets and detections, enabling a comprehensive modeling of data association challenges in tracklet formation. The approach constructs an affinity graph where nodes represent detections or tracklets and edges model potential associations.

Proposal Generation: The paper introduces an iterative graph clustering strategy for generating tracking proposals. This method iteratively clusters the graph's nodes to form proposals that are hypothesized object trajectories. It optimizes computational efficiency while ensuring high-quality proposals by balancing between clustering granularity and computational cost.
Proposal Scoring: The proposals are evaluated using a trainable Graph Convolutional Network (GCN). The GCN learns to score proposals based on higher-order structural patterns rather than mere pairwise affinities, which enhances the network's ability to identify the most promising proposals.
Trajectory Inference: A simple de-overlapping strategy is adopted to convert high-scoring proposals into non-conflicting trajectories for final tracking, ensuring that each detection is associated with only one track.

Empirical Results

The proposed method's efficacy is demonstrated through experimental validation on public benchmarks, MOT17 and MOT20. It reports an improvement in key tracking performance metrics, particularly MOTA (Multiple Object Tracking Accuracy) and IDF1, reinforcing the method's potential for better object coverage and identity preservation.

The paper provides quantitative results where the framework achieves a significant performance enhancement over existing state-of-the-art methods, reflected in a 1.2% rise in the IDF1 score on MOT17 benchmarks. Additionally, this performance was corroborated by the high precision and recall achieved without extensive computational overhead, thanks to the efficient proposal generation strategy and the message-passing capability of GCNs.

Implications and Future Work

The proposed proposal-based learnable MOT framework advances the field by incorporating scalable, data-driven techniques for addressing data association in tracking. The implications are manifold:

Scalability: The method potentially scales to complex scene analysis tasks in surveillance or autonomous driving where occlusions and scene clutter present substantial challenges.
Extensibility: The graph-based and learning-centric nature of the framework suggests adaptability to incorporate dynamic scene elements or real-time constraints.

The paper suggests future directions to achieve an end-to-end trainable framework emphasizing proposal generation, which would further integrate learning into the MOT pipeline. This progression toward end-to-end systems aligns with contemporary trends in AI where model components are cohesively trained to minimize human-driven heuristics and optimize performance on tracking datasets.

In summary, the paper sets a precedent for leveraging graph-based learning architectures to redefine object tracking paradigms, focusing on efficient data association through learned proposal evaluation and trajectory formulation.

PDF Markdown

Related Papers

GitHub

GitHub - daip13/LPC_MOT: This is the code for the paper "Learning a Proposal Classifier for Multiple Target tracking" (75 stars)