Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

X Modality Assisting RGBT Object Tracking (2312.17273v2)

Published 27 Dec 2023 in cs.CV

Abstract: Developing robust multi-modal feature representations is crucial for enhancing object tracking performance. In pursuit of this objective, a novel X Modality Assisting Network (X-Net) is introduced, which explores the impact of the fusion paradigm by decoupling visual object tracking into three distinct levels, thereby facilitating subsequent processing. Initially, to overcome the challenges associated with feature learning due to significant discrepancies between RGB and thermal modalities, a plug-and-play pixel-level generation module (PGM) based on knowledge distillation learning is proposed. This module effectively generates the X modality, bridging the gap between the two patterns while minimizing noise interference. Subsequently, to optimize sample feature representation and promote cross-modal interactions, a feature-level interaction module (FIM) is introduced, integrating a mixed feature interaction transformer and a spatial dimensional feature translation strategy. Finally, to address random drifting caused by missing instance features, a flexible online optimization strategy called the decision-level refinement module (DRM) is proposed, which incorporates optical flow and refinement mechanisms. The efficacy of X-Net is validated through experiments on three benchmarks, demonstrating its superiority over state-of-the-art trackers. Notably, X-Net achieves performance gains of 0.47%/1.2% in the average of precise rate and success rate, respectively. Additionally, the research content, data, and code are pledged to be made publicly accessible at https://github.com/DZSYUNNAN/XNet.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (5)
  1. L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. v. d. Weijer, and F. S. Khan, “Multi-Modal Fusion for End-to-End RGB-T Tracking,” in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 2019, pp. 2252-2261.
  2. H. Nam, and B. Han, “Learning Multi-domain Convolutional Neural Networks for Visual Tracking,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 4293-4302.
  3. X. Wang, X. Shu, S. Zhang, B. Jiang, Y. Wang, Y. Tian, and F. J. a. p. a. Wu, “MFGNet: Dynamic modality-aware filter generation for RGB-T tracking,” arXiv preprint arXiv:.10433, 2021.
  4. X. Xiao, X. Xiong, F. Meng, and Z. J. S. Chen, “Multi-scale feature interactive fusion network for rgbt tracking,” Sensors, vol. 23, no. 7, pp. 3410, 2023.
  5. Y. Zhu, C. Li, B. Luo, J. Tang, and X. Wang, “Dense Feature Aggregation and Pruning for RGBT Tracking,” in Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 2019, pp. 465–472.
Citations (2)

Summary

We haven't generated a summary for this paper yet.