Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking (1907.13242v2)

Published 30 Jul 2019 in cs.CV

Abstract: We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF.

Citations (160)

View on Semantic Scholar

Summary

The paper introduces the GFS-DCF framework that jointly learns correlation filters and performs group feature selection across spatial and channel dimensions to boost tracking performance.
Empirical evaluations across benchmarks like OTB2013, OTB2015, VOT2017, VOT2018, and TrackingNet demonstrate significant accuracy improvements over state-of-the-art trackers.
The adaptive temporal smoothing mechanism leverages historical information to reduce feature redundancy and maintain robust filter behavior throughout video sequences.

Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking

The paper addresses a critical challenge in the domain of visual object tracking: the necessity for effectively capturing spatial and channel-specific features to improve tracking robustness. The authors present a novel framework termed Group Feature Selection for Discriminative Correlation Filter-based visual object tracking (GFS-DCF). The method innovates by incorporating group feature selection across both the spatial and channel dimensions, marking a departure from the traditional approaches which generally focus on spatial regularization or feature selection in isolation.

The core contribution of GFS-DCF is its formulation that integrates the learning of correlation filters with simultaneous multi-dimensional group feature selection. This approach ensures the selection of structurally relevant multi-channel features, which enhances the efficacy of the tracking filters. Additionally, GFS-DCF introduces an adaptive mechanism that incorporates historical information to smooth the filter behavior across temporal frames by utilizing a low-rank approximation. This design dynamically adjusts the spatial-channel configurations during the tracking process, which substantially reduces information redundancy and mitigates the impact of less discriminative feature representations.

Empirical evaluation of GFS-DCF across several established benchmarks, including OTB2013, OTB2015, VOT2017, VOT2018, and TrackingNet, demonstrated the superiority of the proposed method over state-of-the-art trackers. Significant performance improvements are attributed to the joint spatial-channel feature selection mechanism and temporal smoothness constraints. More specifically, the adaptive selection effectively highlighted the most relevant features, ensuring that the learned filters maintained high discriminative power and interpretability throughout the video sequence.

Furthermore, the experimental analyses provided insights into the effectiveness of the group feature selection mechanism. For instance, testing revealed that hand-crafted features benefited greatly from spatial selection, while CNN-derived deep features saw marked performance gains from channel selection. The compression of feature dimensions not only maintained but in many cases enhanced tracking accuracy, validating the proposed approach’s ability to effectively minimize both feature redundancy and irrelevant information across channels and spatial dimensions.

For future developments, the proposed GFS-DCF method invites exploration into applying similar multilevel selection strategies in other domains of computer vision where deep feature redundancy hinders model performance. Moreover, the dynamic adaptation capabilities introduced by the temporal smoothing regularization present opportunities to expand upon adaptive tracking models that further explore low-rank manifold learning and incorporating contextual scene dynamics.

In conclusion, the GFS-DCF method represents a significant contribution to visual object tracking by strategically consolidating correlation filters with comprehensive group feature selection. This multifaceted integration not only enhances filter robustness and discrimination but also leads to a deeper understanding of the alignment dynamics of multi-channel features with the underlying framework of visual object trackers. This work opens avenues for further explorations in multi-dimensional optimizations within AI-driven visual tracking systems.

PDF Markdown

Related Papers

GitHub

GitHub - XU-TIANYANG/GFS-DCF: Matlab implementation of ICCV2019 paper "Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking" (57 stars)