Context-aware Deep Feature Compression for High-speed Visual Tracking (1803.10537v1)

Published 28 Mar 2018 in cs.CV

Abstract: We propose a new context-aware correlation filter based tracking framework to achieve both high computational speed and state-of-the-art performance among real-time trackers. The major contribution to the high computational speed lies in the proposed deep feature compression that is achieved by a context-aware scheme utilizing multiple expert auto-encoders; a context in our framework refers to the coarse category of the tracking target according to appearance patterns. In the pre-training phase, one expert auto-encoder is trained per category. In the tracking phase, the best expert auto-encoder is selected for a given target, and only this auto-encoder is used. To achieve high tracking performance with the compressed feature map, we introduce extrinsic denoising processes and a new orthogonality loss term for pre-training and fine-tuning of the expert auto-encoders. We validate the proposed context-aware framework through a number of experiments, where our method achieves a comparable performance to state-of-the-art trackers which cannot run in real-time, while running at a significantly fast speed of over 100 fps.

Citations (189)

View on Semantic Scholar

Summary

Analysis of Context-aware Deep Feature Compression for High-speed Visual Tracking

The paper "Context-aware Deep Feature Compression for High-speed Visual Tracking" introduces a context-aware correlation filter-based tracking framework called TRACA, designed to achieve high computational speed while maintaining robust performance. This framework represents a significant advancement in the domain of visual tracking, emphasizing the necessity of balancing tracking speed with sophisticated performance to address real-time application constraints effectively.

Methodological Contributions

The core contribution of the paper lies in its novel approach to deep feature compression. By employing multiple expert auto-encoders, each specialized for different contextual scenarios, it effectively reduces feature dimensionality while preserving essential tracking information. The concept of context, within this framework, pertains to categorizing tracking targets based on appearance patterns. During pre-training, each auto-encoder is fine-tuned for a specific category, referred to as expert auto-encoders, and during tracking, the optimal auto-encoder is selected based on contextual information related to the target.

A pivotal aspect of this approach is the incorporation of orthogonality within the loss function, a strategy aimed at enhancing the quality of compressed feature maps through regularization. The orthogonality constraint facilitates decorrelation among feature channels, thus improving robustness in dynamic visual environments.

Another innovative aspect of the methodology is the integration of a context-aware network, leverage using a pre-trained VGG-Net, to dynamically select the most relevant expert auto-encoder for any given tracking scenario. This selection process is critical in ensuring that the compressed features are as relevant and accurate as possible, enabling high-speed processing without sacrificing accuracy.

Results and Implications

The paper expounds on the system's capability with a series of experiments benchmarked on well-known datasets such as CVPR2013 and TPAMI2015. The results demonstrate that TRACA performs comparably with state-of-the-art trackers, notably maintaining high accuracy while running at speeds exceeding 100 fps. Such performance metrics illustrate TRACA's potential applicability in real-time scenarios, especially where computational resources may be constrained, and latency is critical.

The implications of this work are multifaceted. Practically, it enables the deployment of visual tracking in environments where computational resources are limited or where real-time response is paramount. Theoretically, it offers insights into the benefits of context-aware feature compression and highlights the potential for extending these principles to broader machine learning tasks such as image k-shot learning and domain adaptation.

Future Directions

The paper hints at future research avenues that involve the joint training of expert auto-encoders and the context-aware network. Doing so could potentially enhance performance further by exploiting the correlations between contextual clustering and feature compression. Another intriguing future direction is expanding the application scope beyond visual tracking to other domains in computer vision where context-aware compression could mitigate challenges related to dataset variability and overfitting.

Conclusion

In conclusion, the introduction of context-aware deep feature compression via expert auto-encoders marks a significant methodological advancement in visual tracking. The TRACA framework successfully demonstrates that real-time performance need not compromise tracking accuracy, a noteworthy achievement in the context of increasingly demanding AI applications. This paper lays the groundwork for subsequent research and technology development focused on efficient, scalable, and adaptable AI systems.