Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking (1803.03347v1)

Published 9 Mar 2018 in cs.CV

Abstract: Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.

Authors (4)

Tharindu Fernando (44 papers)
Simon Denman (74 papers)
Sridha Sridharan (106 papers)
Clinton Fookes (148 papers)

Citations (62)

View on Semantic Scholar

Summary

The paper introduces a deep learning framework for multi-person localization and tracking that avoids reliance on appearance models by using a GAN and trajectory prediction via LSTM networks.
Key components include a sequential GAN for generating localization probability maps and an LSTM-based trajectory prediction system enabling data association without costly re-identification.
Evaluations demonstrate superior tracking performance on benchmarks, achieving better MOTA and fewer ID switches than appearance-based methods, with implications for applications like autonomous driving.

Tracking by Prediction: A Deep Generative Model for Multi-Person Localisation and Tracking

The paper introduces a comprehensive deep learning framework for multi-person localization and tracking, addressing some of the critical limitations of existing systems which predominantly rely on appearance models for target re-identification. These conventional approaches often suffer in crowded environments due to occlusions and noisy detections, as well as the inefficiencies introduced by the heavy computational demands of appearance-based methods. The proposed system innovatively employs a Generative Adversarial Network (GAN) architecture coupled with trajectory prediction to alleviate these issues, demonstrating clear advantages over existing algorithms through detailed evaluations on standard benchmarks.

Methodology Overview

The proposed framework consists of several noteworthy components. Foremost among them is the use of a sequential GAN for person localization, offering improved performance amidst challenging conditions such as occlusions and foreground noise. This model forms a probability map that represents the likelihood of pedestrian locations without the necessity of frame-by-frame processing, thereby discerning between actual subjects and dynamic non-human movements.

Further advancing the scope of tracking is the deployment of a trajectory prediction mechanism based on Long Short Term Memory (LSTM) networks. This approach facilitates a novel data association paradigm, removing the reliance on costly person re-identification by harnessing predicted trajectories. The strategy is two-fold: predicting both short-term and long-term trajectories, which serves dual purposes of immediate data association and extended trajectory refinement. This dual mechanism proves crucial in mimicking human-like trajectories even when faced with substantial occlusions or artifacts.

Experimental Results

The evaluation demonstrates robust performance of the proposed system across several publicly available datasets, including PETS2009 and ETHMS. Notably, the system achieves superior tracking metrics such as Multiple Object Tracking Accuracy (MOTA) and significantly reduced ID switches compared to both probabilistic and deep learning-based methods without reliance on appearance features. The paper thoroughly contrasts this performance against contemporary methods, highlighting its effectiveness in real-world application scenarios, especially under adverse conditions.

Implications and Future Directions

This work has important implications for real-world applications such as autonomous driving, robotics, and sports analytics, where accurate multi-person tracking is essential. By strategically leveraging GANs and LSTM networks, the authors present a lightweight yet effective model that significantly diminishes the computational burden commonly associated with extensive feature extraction methods. The proposed system highlights the capacity for trajectory-based data association to surpass traditional methods, suggesting further exploration into hybrid models that could integrate both appearance and predictive trajectory features for optimal multi-person tracking.

In conclusion, the authors provide a profound contribution to the field of computer vision, specifically, multi-person localization and tracking. The paper sets a precedent for future research into trajectory-based tracking systems, encouraging the exploration of new generative models and improved prediction mechanisms to further refine tracking accuracy and computational efficiency.

Related Papers

YouTube

Show All Videos