Multi-Person Tracking by Multicut and Deep Matching (1608.05404v1)

Published 17 Aug 2016 in cs.CV

Abstract: In [1], we proposed a graph-based formulation that links and clusters person hypotheses over time by solving a minimum cost subgraph multicut problem. In this paper, we modify and extend [1] in three ways: 1) We introduce a novel local pairwise feature based on local appearance matching that is robust to partial occlusion and camera motion. 2) We perform extensive experiments to compare different pairwise potentials and to analyze the robustness of the tracking formulation. 3) We consider a plain multicut problem and remove outlying clusters from its solution. This allows us to employ an efficient primal feasible optimization algorithm that is not applicable to the subgraph multicut problem of [1]. Unlike the branch-and-cut algorithm used there, this efficient algorithm used here is applicable to long videos and many detections. Together with the novel feature, it eliminates the need for the intermediate tracklet representation of [1]. We demonstrate the effectiveness of our overall approach on the MOT16 benchmark [2], achieving state-of-art performance.

Citations (207)

View on Semantic Scholar

Summary

The paper introduces a novel local pairwise feature using DeepMatching that significantly improves tracking robustness in dynamic scenarios.
The study demonstrates the superiority of DeepMatching over traditional spatio-temporal features, especially in sequences with rapid camera motion.
The efficient multicut optimization eliminates intermediate tracklet representations, enabling state-of-the-art performance on the MOT16 benchmark.

Insights into Multi-Person Tracking by Multicut and Deep Matching

This paper explores advancements in multi-person tracking by extending the concept of minimum cost subgraph multicut previously introduced by the authors. The researchers propose significant modifications that enhance tracking robustness and efficiency, addressing the complexities inherent in crowded scenes with partial occlusion, camera motion, and false positive detections.

Key Contributions

Local Pairwise Feature: A novel feature based on local appearance matching, leveraging DeepMatching, is introduced. This method showcases robustness to camera motion and partial occlusion, enabled by the comprehensive matchings of local image features. This is a significant departure from previous reliance on spatio-temporal relations, thus allowing for applicability in more dynamic and moving-camera scenarios.
Comparison of Pairwise Potentials: The experiments emphasize the superiority of the DeepMatching pairwise feature over traditional spatio-temporal features, particularly over longer temporal windows, demonstrating improved accuracy in scenarios with significant camera motion.
Efficient Optimization: Moving from the subgraph multicut to a plain multicut formulation allows the use of a more efficient primal feasible optimization algorithm. This modification eliminates the need for intermediate tracklet representations, which simplifies the tracking process, enhancing applicability to longer videos with numerous detections.

Experimental Findings

The paper presents extensive evaluations on the MOT16 benchmark, where the proposed method achieves state-of-the-art performance. Notably, the approach is demonstrated to be robust against detection noise, capable of handling various levels of input detection quality while maintaining competitive tracking accuracy. The use of DeepMatching features directly translates to improved tracking performance, especially in sequences with rapid camera motion and crowded scenes.

Implications for Future Research

The work implies future exploration can extend beyond the current multicut formulation by integrating more sophisticated appearance models that further enhance robustness against complex environments with variable lighting and moving backgrounds. Additionally, the merging of DeepMatching with real-time processing algorithms could hold potential for sizeable real-world applications in autonomous systems and security surveillance.

Conclusion

This paper delineates a comprehensive methodology by integrating graph-based multicut formulations with advanced image matching techniques for multi-person tracking. The elimination of intermediate processes and the introduction of robust pairwise features signify substantial contributions to tracking efficiency and accuracy. The implications of this research indicate possibilities of broadening tracking frameworks within evolving dynamic scenes, setting a benchmark for further exploration into real-time multi-object tracking systems.

PDF Markdown