Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking (1803.08679v1)

Published 23 Mar 2018 in cs.CV

Abstract: Discriminative Correlation Filters (DCF) are efficient in visual tracking but suffer from unwanted boundary effects. Spatially Regularized DCF (SRDCF) has been suggested to resolve this issue by enforcing spatial penalty on DCF coefficients, which, inevitably, improves the tracking performance at the price of increasing complexity. To tackle online updating, SRDCF formulates its model on multiple training images, further adding difficulties in improving efficiency. In this work, by introducing temporal regularization to SRDCF with single sample, we present our spatial-temporal regularized correlation filters (STRCF). Motivated by online Passive-Agressive (PA) algorithm, we introduce the temporal regularization to SRDCF with single sample, thus resulting in our spatial-temporal regularized correlation filters (STRCF). The STRCF formulation can not only serve as a reasonable approximation to SRDCF with multiple training samples, but also provide a more robust appearance model than SRDCF in the case of large appearance variations. Besides, it can be efficiently solved via the alternating direction method of multipliers (ADMM). By incorporating both temporal and spatial regularization, our STRCF can handle boundary effects without much loss in efficiency and achieve superior performance over SRDCF in terms of accuracy and speed. Experiments are conducted on three benchmark datasets: OTB-2015, Temple-Color, and VOT-2016. Compared with SRDCF, STRCF with hand-crafted features provides a 5 times speedup and achieves a gain of 5.4% and 3.6% AUC score on OTB-2015 and Temple-Color, respectively. Moreover, STRCF combined with CNN features also performs favorably against state-of-the-art CNN-based trackers and achieves an AUC score of 68.3% on OTB-2015.

Citations (678)

View on Semantic Scholar

Summary

The paper introduces STRCF by integrating temporal regularization into the SRDCF framework, balancing aggressive model updates with prior knowledge.
It employs online Passive-Aggressive learning and ADMM to achieve efficient and globally optimal filter updates with lower computational cost.
STRCF demonstrates a 5-fold increase in speed and significant accuracy improvements on benchmarks, highlighting its practical utility for real-time tracking.

Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking

The paper "Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking" presents a model that enhances the efficiency and accuracy of visual tracking by combining spatial and temporal regularization into correlation filters. The work builds upon the Discriminative Correlation Filters (DCF) framework, specifically addressing the limitations of Spatially Regularized DCF (SRDCF).

Contributions and Methodology

The primary contribution of the paper is the development of Spatial-Temporal Regularized Correlation Filters (STRCF). This model introduces temporal regularization to SRDCF, leveraging online Passive-Aggressive (PA) learning to balance between aggressively updating models with new data and passively maintaining consistency with prior knowledge. By focusing on a single sample rather than multiple samples, STRCF approximates the performance of SRDCF while significantly reducing computational complexity.

The STRCF model effectively mitigates boundary issues without incurring substantial efficiency losses. The authors utilize the Alternating Direction Method of Multipliers (ADMM) to solve the STRCF efficiently. This provides globally optimal solutions due to its convex formulation, and empirical results demonstrate convergence within minimal iterations.

Strong Numerical Results and Claims

The paper provides robust numerical evidence supporting the superior performance of STRCF over SRDCF. When implemented with hand-crafted features, STRCF achieves a 5-fold speed increase while improving accuracy by 5.4% on the OTB-2015 benchmark, and a 3.6% increase on the Temple-Color benchmark. With deep features, STRCF reaches an AUC score of 68.3% on OTB-2015, underscoring its competitiveness with state-of-the-art tracking methods.

Theoretical and Practical Implications

The introduction of temporal regularization provides theoretical insights into the potential benefits of blending temporal dynamics into spatial feature modeling. Practically, this facilitates more robust tracking under challenging conditions such as occlusion and deformation, where appearance variations are significant.

The proposed STRCF demonstrates a critical advancement in maintaining tracking accuracy without sacrificing speed, a long-standing challenge in the development of real-time tracking systems. This has direct applications in environments where computational resources are limited or real-time processing is mandatory, such as in autonomous vehicles and surveillance systems.

Future Speculations in AI

The success of STRCF emphasizes the importance of integrating spatial and temporal features for enhancing learning models in dynamic and uncertain environments. Future research could explore further hybrid models that integrate additional dimensions of data, such as context or semantic insights. Additionally, advancements could be made in extending STRCF to support 3D or multi-dimensional data, with applications extending to robotics, augmented reality, and complex scene understanding.

Conclusion

Overall, the paper provides a significant enhancement to the SRDCF framework, combining high accuracy with improved computational performance. The incorporation of a spatial-temporal regularization paradigm not only advances current visual tracking techniques but also sets a foundation for future research in the field. Through meticulous empirical validation, STRCF demonstrates its capability to outperform current models, providing a robust solution for real-time tracking challenges.

PDF Markdown