Dense Feature Aggregation and Pruning for RGBT Tracking (1907.10451v1)

Published 24 Jul 2019 in cs.CV

Abstract: How to perform effective information fusion of different modalities is a core factor in boosting the performance of RGBT tracking. This paper presents a novel deep fusion algorithm based on the representations from an end-to-end trained convolutional neural network. To deploy the complementarity of features of all layers, we propose a recursive strategy to densely aggregate these features that yield robust representations of target objects in each modality. In different modalities, we propose to prune the densely aggregated features of all modalities in a collaborative way. In a specific, we employ the operations of global average pooling and weighted random selection to perform channel scoring and selection, which could remove redundant and noisy features to achieve more robust feature representation. Experimental results on two RGBT tracking benchmark datasets suggest that our tracker achieves clear state-of-the-art against other RGB and RGBT tracking methods.

Authors (5)

Yabin Zhu (8 papers)
Chenglong Li (94 papers)
Bin Luo (209 papers)
Jin Tang (139 papers)
Xiao Wang (508 papers)

Citations (164)

View on Semantic Scholar

Summary

The paper proposes DAPNet, a novel network using dense feature aggregation and pruning to enhance RGBT tracking performance by effectively integrating RGB and thermal data.
The dense feature aggregation recursively combines features across network layers for robust representations, while feature pruning removes redundancy using weighted random selection for improved efficiency and robustness.
Evaluations on GTOT and RGBT234 datasets demonstrate DAPNet significantly outperforms existing RGB and RGBT trackers across various challenging conditions, showing improved precision and success rates.

Dense Feature Aggregation and Pruning for RGBT Tracking

The paper titled "Dense Feature Aggregation and Pruning for RGBT Tracking" introduces a novel approach within visual tracking domains, specifically addressing the integration and extraction of information from both RGB and thermal infrared modalities for robust RGBT tracking. The focus on modality fusion through feature aggregation and pruning represents a significant step in enhancing tracking performance under various challenging conditions that involve appearance changes and poor illumination.

Core Contributions

The authors propose a Dense feature Aggregation and Pruning Network (DAPNet), which effectively combines two pivotal methods: dense feature aggregation and feature pruning. The dense feature aggregation strategy enables recursive integration of features across all layers of the convolutional neural network, thus yielding robust feature representations. This is critical for leveraging both spatial and semantic characteristics embedded within the layered structure of CNNs. By establishing an end-to-end deep learning framework with dense feature connectivity, enhancements in target object representations are achieved, which are crucial for precise tracking.

A key advancement introduced by the authors is feature pruning, primarily aimed at addressing redundancy and noise inherent in the densely aggregated feature set. By employing operations such as global average pooling coupled with weighted random selection for channel scoring and selection, the model endeavors to eliminate redundant channels, thus maintaining only the necessary features for accurate target localization. This strategy not only reduces overfitting but also ensures robust tracking by dynamically focusing on pertinent feature channels.

Performance and Evaluation

Experimental evaluations conducted by the authors on two prominent RGBT benchmark datasets, GTOT and RGBT234, reveal that the proposed DAPNet significantly outperforms existing RGB and RGBT tracking methods. Specifically, notable improvements are observed in precision rates and success rates across varied challenging scenarios such as low illumination and high background clutter. The DAPNet demonstrates a substantial capability in managing occlusions, deformations, and other visual distractions by effectively utilizing inter-modal complementarity.

Moreover, the paper provides a comprehensive comparison against state-of-the-art trackers adapted for RGBT tracking. The results unequivocally point to the potential of DAPNet in achieving state-of-the-art performance, with noticeable enhancements over both baseline and extended RGB trackers.

Theoretical and Practical Implications

This work provides valuable insights into efficient multi-modal fusion frameworks, with implications expanding beyond object tracking into broader AI applications where multi-sensor integration is vital. The proposed feature pruning methodology could inspire modifications in existing neural network architectures to improve computational efficiency and tracking robustness.

The recursive feature aggregation approach embodies a potential shift toward exploiting dense layers for enhanced feature richness, offering avenues for further research in the design of hierarchical and parallel structures within neural networks. Additionally, DAPNet's efficacy in dealing with various tracking challenges could be extended to other domains requiring real-time processing under uncertainty, such as autonomous driving and surveillance systems.

Future Directions

The paper hints at future research aimed at incorporating performance improvements through adaptations of wider and deeper networks, integrating methodologies such as RoIAlign for better object representation, and achieving real-time tracking capabilities. Such extensions may further elevate the application of DAPNet in practical scenarios, enhancing both real-world efficacy and computational feasibility.

In summary, the paper presents a sophisticated approach to RGBT tracking challenges, with significant implications across AI and computer vision domains. The integration of dense feature aggregation and innovative pruning strategies highlights a potential pathway for future advancements in multi-modal neural network design.