ECO: Efficient Convolution Operators for Tracking (1611.09224v2)

Published 28 Nov 2016 in cs.CV

Abstract: In recent years, Discriminative Correlation Filter (DCF) based methods have significantly advanced the state-of-the-art in tracking. However, in the pursuit of ever increasing tracking performance, their characteristic speed and real-time capability have gradually faded. Further, the increasingly complex models, with massive number of trainable parameters, have introduced the risk of severe over-fitting. In this work, we tackle the key causes behind the problems of computational complexity and over-fitting, with the aim of simultaneously improving both speed and performance. We revisit the core DCF formulation and introduce: (i) a factorized convolution operator, which drastically reduces the number of parameters in the model; (ii) a compact generative model of the training sample distribution, that significantly reduces memory and time complexity, while providing better diversity of samples; (iii) a conservative model update strategy with improved robustness and reduced complexity. We perform comprehensive experiments on four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor. When using expensive deep features, our tracker provides a 20-fold speedup and achieves a 13.0% relative gain in Expected Average Overlap compared to the top ranked method in the VOT2016 challenge. Moreover, our fast variant, using hand-crafted features, operates at 60 Hz on a single CPU, while obtaining 65.0% AUC on OTB-2015.

Citations (2,217)

View on Semantic Scholar

Summary

The paper introduces a factorized convolution operator that reduces parameters by around 80% while boosting tracking accuracy.
It presents a compact generative sample model that reduces training sample redundancy by 90% without sacrificing appearance diversity.
An efficient update strategy is proposed, cutting optimization iterations by 80% and mitigating over-fitting for robust, real-time tracking.

Efficient Convolution Operators (ECO) for Tracking

The paper "ECO: Efficient Convolution Operators for Tracking" by Martin Danelljan et al. addresses significant issues in Discriminative Correlation Filter (DCF) based tracking methods, namely computational complexity and over-fitting. Over recent years, DCF-based methods have become central to visual tracking tasks but have encountered growing challenges related to real-time performance and model robustness. To tackle these challenges, the authors propose several innovative techniques: factorized convolution operators, a compact generative model of the training sample space, and an efficient model update strategy.

Approach and Contributions

The authors revisit the core DCF formulation and introduce a multifaceted approach to address both computational and over-fitting issues:

Factorized Convolution Operator:
- The factorized convolution operator significantly reduces the number of model parameters by learning a smaller set of basis filters and representing each feature layer as a linear combination of these basis filters. This strategy cuts down the number of parameters by approximately 80% when using deep features, leading to both reduced complexity and enhanced tracking performance.
Compact Generative Sample Model:
- The researchers introduce a probabilistic generative model of the training sample set, modeled using a Gaussian Mixture Model (GMM). This GMM effectively reduces the number of samples during training, preserving the diversity of appearances while eliminating redundancy. This model reduces the number of training samples by 90%.
Efficient Model Update Strategy:
- A more conservative update strategy is adopted, which improves robustness and computational efficiency. By updating the model less frequently, the risk of over-fitting diminishes, and the learning process becomes less computationally intensive. This leads to an 80% reduction in optimization iterations.

Experimental Validation

The paper presents extensive experiments across four benchmarks: VOT2016, UAV123, OTB-2015, and TempleColor, demonstrating the efficiency and effectiveness of the proposed methods.

VOT2016:
- The ECO tracker delivers a 20-fold speedup and a 13.0% relative gain in Expected Average Overlap (EAO) compared to the top-ranked C-COT method. ECO-HC, the variant using hand-crafted features, operates at a real-time speed of 60 Hz on a single CPU, achieving a 65.0% Area Under Curve (AUC) on OTB-2015.
Numerical Results:
- On average, the tracker achieves a significant performance improvement across multiple benchmarks, validating each of the proposed contributions (factorized convolution, sample model, and update strategy) independently and in combination.

Implications and Future Developments

ECO's significant improvements in both tracking performance and speed have profound implications for real-time vision applications. For instance, in scenarios such as autonomous navigation or surveillance, where rapid response times are crucial, ECO's efficiency and robustness can lead to better operational performance and reliability.

The compact generative sample model and factorized convolution operator provide a foundation for future work aiming to further reduce model size and complexity without sacrificing accuracy. The generative model's adaptability to high-dimensional feature spaces suggests potential in various other machine learning tasks beyond visual tracking.

Speculations on Future Developments in AI

The techniques introduced in ECO hint at broader trends in AI research, emphasizing efficiency and robustness over sheer complexity. Moving forward, models that can learn effectively from fewer parameters and adapt robustly to new data will be critical. The ECO's approach to mitigating over-fitting while improving computational efficiency will likely inspire further innovations in real-time AI systems, highlighting the value of joint optimization approaches in machine learning research.

In conclusion, the work by Danelljan et al. represents a significant step towards more efficient, robust, and real-time capable tracking systems within computer vision, with promising applications and future research directions in the field of AI.

PDF Markdown