Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking (2403.18193v2)

Published 27 Mar 2024 in cs.CV

Abstract: RGB-T tracking, a vital downstream task of object tracking, has made remarkable progress in recent years. Yet, it remains hindered by two major challenges: 1) the trade-off between performance and efficiency; 2) the scarcity of training data. To address the latter challenge, some recent methods employ prompts to fine-tune pre-trained RGB tracking models and leverage upstream knowledge in a parameter-efficient manner. However, these methods inadequately explore modality-independent patterns and disregard the dynamic reliability of different modalities in open scenarios. We propose M3PT, a novel RGB-T prompt tracking method that leverages middle fusion and multi-modal and multi-stage visual prompts to overcome these challenges. We pioneer the use of the adjustable middle fusion meta-framework for RGB-T tracking, which could help the tracker balance the performance with efficiency, to meet various demands of application. Furthermore, based on the meta-framework, we utilize multiple flexible prompt strategies to adapt the pre-trained model to comprehensive exploration of uni-modal patterns and improved modeling of fusion-modal features in diverse modality-priority scenarios, harnessing the potential of prompt learning in RGB-T tracking. Evaluating on 6 existing challenging benchmarks, our method surpasses previous state-of-the-art prompt fine-tuning methods while maintaining great competitiveness against excellent full-parameter fine-tuning methods, with only 0.34M fine-tuned parameters.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Autonomous driving: cognitive construction and situation understanding. Science China Information Sciences, 62:1–27, 2019.
  2. Improving performance of robots using human-inspired approaches: a survey. Science China Information Sciences, 65(12):221201, 2022.
  3. Anomaly detection by exploiting the tracking trajectory in surveillance videos. Science China Information Sciences, 63:1–3, 2020.
  4. Multi-adapter rgbt tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  5. Multi-modal fusion for end-to-end rgb-t tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
  6. Challenge-aware rgbt tracking. In Proceedings of the European Conference on Computer Vision, pages 222–237. Springer, 2020.
  7. M5l: multi-modal multi-margin metric learning for rgbt tracking. IEEE Transactions on Image Processing, 31:85–98, 2021.
  8. Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Transactions on Image Processing, 30:5613–5625, 2021.
  9. Jointly modeling motion and appearance cues for robust rgb-t tracking. IEEE Transactions on Image Processing, 30:3335–3347, 2021.
  10. Siamcda: Complementarity-and distractor-aware rgb-t tracking based on siamese network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1403–1417, 2021.
  11. Rgbt tracking by trident fusion network. IEEE Transactions on Circuits and Systems for Video Technology, 32(2):579–592, 2021.
  12. Learning adaptive attribute-driven representation for real-time rgb-t tracking. International Journal of Computer Vision, 129:2714–2729, 2021.
  13. Attribute-based progressive fusion network for rgbt tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 2831–2838, 2022.
  14. Rgb-t tracking by modality difference reduction and feature re-selection. Image and Vision Computing, 127:104547, 2022.
  15. Visible-thermal uav tracking: A large-scale benchmark and new baseline. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8886–8895, 2022.
  16. Prompting for multi-modal tracking. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3492–3500, 2022.
  17. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9516–9526, 2023.
  18. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision Workshops, pages 850–865. Springer, 2016.
  19. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8971–8980, 2018.
  20. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4282–4291, 2019.
  21. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6182–6191, 2019.
  22. Siamcar: Siamese fully convolutional classification and regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6269–6277, 2020.
  23. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8126–8135, 2021.
  24. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10448–10457, 2021.
  25. Hift: Hierarchical feature transformer for aerial tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15457–15466, 2021.
  26. Swintrack: A simple and strong baseline for transformer tracking. In Advances in Neural Information Processing Systems, volume 35, pages 16743–16754, 2022.
  27. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision, pages 341–357. Springer, 2022.
  28. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13608–13618, 2022.
  29. Backbone is all your need: A simplified architecture for visual object tracking. In Proceedings of the European Conference on Computer Vision, pages 375–392. Springer, 2022.
  30. Target-aware tracking with long-term context attention. arXiv preprint arXiv:2302.13840, 2023.
  31. Robust object modeling for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9589–9600, 2023.
  32. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(5):1562–1577, 2021.
  33. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5374–5383, 2019.
  34. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European conference on computer vision, pages 300–317, 2018.
  35. Lasher: A large-scale high-diversity benchmark for rgbt tracking. IEEE Transactions on Image Processing, 31:392–404, 2021.
  36. Visual prompt tuning. In Proceedings of the European Conference on Computer Vision, pages 709–727. Springer, 2022.
  37. Visual prompting via image inpainting. Advances in Neural Information Processing Systems, 35:25005–25017, 2022.
  38. Diversity-aware meta visual prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10878–10887, 2023.
  39. Visual prompt tuning for generative transfer learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 19840–19851, 2023.
  40. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, pages 0–0, 2020.
  41. Attention is all you need. In Advances in Neural Information Processing Systems, volume 30, pages 6000–6010, 2017.
  42. Efficient rgb-t tracking via cross-modality distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5404–5413, 2023.
  43. Rgb-t object tracking: Benchmark and baseline. Pattern Recognition, 96:106977, 2019.
  44. Learning patch-based dynamic graph for visual tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 31, pages 1856–1864, 2017.
  45. Rgbt tracking based on prior least absolute shrinkage and selection operator and quality aware fusion of deep and handcrafted features. Knowledge-Based Systems, page 110683, 2023.
  46. Quality-aware feature aggregation network for robust rgbt tracking. IEEE Transactions on Intelligent Vehicles, 6(1):121–130, 2020.
  47. Dense feature aggregation and pruning for rgbt tracking. In Proceedings of the 27th ACM International Conference on Multimedia, pages 465–472, 2019.
  48. Duality-gated mutual condition network for rgbt tracking. IEEE Transactions on Neural Networks and Learning Systems, pages 1–14, 2022.
  49. Learning reliable modal weight with transformer for robust rgbt tracking. Knowledge-Based Systems, 249:108945, 2022.
  50. Mirnet: A robust rgbt tracking jointly with multi-modal interaction and refinement. In IEEE International Conference on Multimedia and Expo (ICME), pages 1–6. IEEE, 2022.
  51. The seventh visual object tracking vot2019 challenge results. In Proceedings of the IEEE/CVF international conference on computer vision workshops, pages 0–0, 2019.
  52. Deep adaptive fusion network for high performance rgbt tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pages 0–0, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Qiming Wang (23 papers)
  2. Yongqiang Bai (4 papers)
  3. Hongxing Song (3 papers)
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com