Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Temporal Adaptive RGBT Tracking with Modality Prompt (2401.01244v1)

Published 2 Jan 2024 in cs.CV

Abstract: RGBT tracking has been widely used in various fields such as robotics, surveillance processing, and autonomous driving. Existing RGBT trackers fully explore the spatial information between the template and the search region and locate the target based on the appearance matching results. However, these RGBT trackers have very limited exploitation of temporal information, either ignoring temporal information or exploiting it through online sampling and training. The former struggles to cope with the object state changes, while the latter neglects the correlation between spatial and temporal information. To alleviate these limitations, we propose a novel Temporal Adaptive RGBT Tracking framework, named as TATrack. TATrack has a spatio-temporal two-stream structure and captures temporal information by an online updated template, where the two-stream structure refers to the multi-modal feature extraction and cross-modal interaction for the initial template and the online update template respectively. TATrack contributes to comprehensively exploit spatio-temporal information and multi-modal information for target localization. In addition, we design a spatio-temporal interaction (STI) mechanism that bridges two branches and enables cross-modal interaction to span longer time scales. Extensive experiments on three popular RGBT tracking benchmarks show that our method achieves state-of-the-art performance, while running at real-time speed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Context-aware fusion of RGB and thermal imagery for traffic monitoring. Sensors, 16(11): 1947.
  2. Exploring visual prompts for adapting large-scale models. arXiv preprint arXiv:2203.17274.
  3. Fully-convolutional siamese networks for object tracking. In European Conference Computer Vision, 850–865. Springer.
  4. Learning discriminative model prediction for tracking. In Proceedings of the IEEE International Conference on Computer Vision, 6182–6191.
  5. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33: 1877–1901.
  6. Rgb-t slam: A flexible slam framework by combining appearance and thermal information. In IEEE International Conference on Robotics and Automation, 5682–5687. IEEE.
  7. Transformer tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8126–8135.
  8. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 13608–13618.
  9. High-performance long-term tracking with meta-updater. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6298–6307.
  10. TIRNet: Object detection in thermal infrared images for autonomous driving. Applied Intelligence, 51: 1244–1261.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020. arXiv preprint arXiv:2010.11929.
  12. Learning reliable modal weight with transformer for robust RGBT tracking. Knowledge-Based Systems, 249: 108945.
  13. Deep adaptive fusion network for high performance RGBT tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
  14. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the International Conference on Artificial Intelligence and Statistics.
  15. Bridging Search Region Interaction With Template for RGB-T Tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 13630–13639.
  16. Visual prompt tuning. In Proceedings of the European Conference on Computer Vision, 709–727. Springer.
  17. Convolutional bypasses are better vision transformer adapters. arXiv preprint arXiv:2207.07039.
  18. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision, 734–750.
  19. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691.
  20. RGB-T object tracking: Benchmark and baseline. Pattern Recognition, 96: 106977.
  21. Challenge-aware RGBT tracking. In Proceedings of the European Conference on Computer Vision, 222–237. Springer.
  22. LasHeR: A large-scale high-diversity benchmark for RGBT tracking. IEEE Transactions on Image Processing, 31: 392–404.
  23. Weighted sparse representation regularized graph learning for RGB-T object tracking. In Proceedings of the ACM International Conference on Multimedia, 1856–1864.
  24. Multi-adapter RGBT tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
  25. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.
  26. RGBT tracking via multi-adapter network with hierarchical divergence loss. IEEE Transactions on Image Processing, 30: 5613–5625.
  27. Differential reinforcement and global collaboration network for rgbt tracking. IEEE Sensors Journal, 23(7): 7301–7311.
  28. Siamese infrared and visible light fusion network for RGB-T tracking. International Journal of Machine Learning and Cybernetics, 1–13.
  29. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763. PMLR.
  30. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 658–666.
  31. Exploiting cloze questions for few shot text classification and natural language inference. arXiv preprint arXiv:2001.07676.
  32. It’s not just size that matters: Small language models are also few-shot learners. arXiv preprint arXiv:2009.07118.
  33. Cross-modal pattern-propagation for RGB-T tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7064–7073.
  34. Learning soft-consistent correlation filters for RGB-T object tracking. In Chinese Conference on Pattern Recognition and Computer Vision, 295–306. Springer.
  35. Attention is all you need. In NIPS.
  36. Attribute-based progressive fusion network for rgbt tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2831–2838.
  37. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, 10448–10457.
  38. Prompting for multi-modal tracking. In Proceedings of the ACM International Conference on Multimedia, 3492–3500.
  39. Cpt: Colorful prompt tuning for pre-trained vision-language models. arXiv preprint arXiv:2109.11797.
  40. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision, 341–357. Springer.
  41. Fast RGB-T tracking via cross-modal correlation filters. Neurocomputing, 334: 172–181.
  42. Multi-modal fusion for end-to-end RGB-T tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops.
  43. Learning the model update for siamese trackers. In Proceedings of the IEEE International Conference on Computer Vision, 4010–4019.
  44. Jointly modeling motion and appearance cues for robust RGB-T tracking. IEEE Transactions on Image Processing, 30: 3335–3347.
  45. Visible-thermal UAV tracking: A large-scale benchmark and new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 8886–8895.
  46. Object fusion tracking based on visible and infrared images: A comprehensive review. Information Fusion, 63: 166–187.
  47. SiamFT: An RGB-infrared fusion tracking method via fully convolutional Siamese networks. IEEE Access, 7: 122122–122133.
  48. Learning multi-domain convolutional network for RGB-T visual tracking. In Proceedings of the International Congress on Image and Signal Processing, BioMedical Engineering and Informatics, 1–6. IEEE.
  49. Prompt vision transformer for domain generalization. arXiv preprint arXiv:2208.08914.
  50. Visual prompt multi-modal tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9516–9526.
  51. Quality-aware feature aggregation network for robust RGBT tracking. IEEE Transactions on Intelligent Vehicles, 6(1): 121–130.
  52. RGBT tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology, 32(2): 579–592.
  53. Distractor-aware siamese networks for visual object tracking. In Proceedings of the European Conference on Computer Vision, 101–117.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Hongyu Wang (104 papers)
  2. Xiaotao Liu (7 papers)
  3. Yifan Li (106 papers)
  4. Meng Sun (83 papers)
  5. Dian Yuan (1 paper)
  6. Jing Liu (526 papers)
Citations (18)