Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

HIPTrack: Visual Tracking with Historical Prompts (2311.02072v2)

Published 3 Nov 2023 in cs.CV

Abstract: Trackers that follow Siamese paradigm utilize similarity matching between template and search region features for tracking. Many methods have been explored to enhance tracking performance by incorporating tracking history to better handle scenarios involving target appearance variations such as deformation and occlusion. However, the utilization of historical information in existing methods is insufficient and incomprehensive, which typically requires repetitive training and introduces a large amount of computation. In this paper, we show that by providing a tracker that follows Siamese paradigm with precise and updated historical information, a significant performance improvement can be achieved with completely unchanged parameters. Based on this, we propose a historical prompt network that uses refined historical foreground masks and historical visual features of the target to provide comprehensive and precise prompts for the tracker. We build a novel tracker called HIPTrack based on the historical prompt network, which achieves considerable performance improvements without the need to retrain the entire model. We conduct experiments on seven datasets and experimental results demonstrate that our method surpasses the current state-of-the-art trackers on LaSOT, LaSOText, GOT-10k and NfS. Furthermore, the historical prompt network can seamlessly integrate as a plug-and-play module into existing trackers, providing performance enhancements. The source code is available at https://github.com/WenRuiCai/HIPTrack.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (52)
  1. Fully-convolutional siamese networks for object tracking. In ECCV, pages 850–865, 2016.
  2. Learning discriminative model prediction for tracking. In ICCV, pages 6182–6191, 2019.
  3. Robust object modeling for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9589–9600, 2023.
  4. Backbone is all your need: a simplified architecture for visual object tracking. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, pages 375–392. Springer, 2022.
  5. Transformer tracking. In CVPR, pages 8126–8135, 2021.
  6. Seqtrack: Sequence to sequence learning for visual object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14572–14581, 2023.
  7. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Advances in Neural Information Processing Systems, 34:11781–11794, 2021.
  8. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13608–13618, 2022.
  9. High-performance long-term tracking with meta-updater. In CVPR, pages 6298–6307, 2020.
  10. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  11. Lasot: A high-quality benchmark for large-scale single object tracking. In CVPR, pages 5374–5383, 2019.
  12. Lasot: A high-quality large-scale single object tracking benchmark. International Journal of Computer Vision, 129:439–461, 2021.
  13. Stmtrack: Template-free visual tracking with space-time memory networks. In CVPR, pages 13774–13783, 2021.
  14. Sparsett: Visual tracking with sparse transformers. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-22, pages 905–912, 2022.
  15. Aiatrack: Attention in attention for transformer visual tracking. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII, pages 146–164. Springer, 2022.
  16. Generalized relation modeling for transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18686–18695, 2023.
  17. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  18. Target-aware tracking with long-term context attention. 2023.
  19. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. TPAMI, 2019.
  20. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  21. Maple: Multi-modal prompt learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19113–19122, 2023.
  22. Need for speed: A benchmark for higher frame rate object tracking. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017.
  23. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
  24. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In CVPR, pages 4282–4291, 2019.
  25. Prefix-tuning: Optimizing continuous prompts for generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 4582–4597, 2021.
  26. Swintrack: A simple and strong baseline for transformer tracking. Advances in Neural Information Processing Systems, 35:16743–16754, 2022.
  27. Microsoft coco: Common objects in context. In ECCV, pages 740–755, 2014.
  28. Focal loss for dense object detection. In ICCV, pages 2980–2988, 2017.
  29. Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13444–13454, 2021.
  30. Transforming model prediction for tracking. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8731–8740, 2022.
  31. A benchmark and simulator for uav tracking. In ECCV, pages 445–461, 2016.
  32. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV, pages 300–317, 2018.
  33. Video object segmentation using space-time memory networks. In ICCV, pages 9226–9235, 2019.
  34. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021.
  35. Denseclip: Language-guided dense prediction with context-aware prompting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18082–18091, 2022.
  36. Generalized intersection over union: A metric and a loss for bounding box regression. In CVPR, pages 658–666, 2019.
  37. Compact transformer tracker with correlative masked modeling. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2023.
  38. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In CVPR, pages 1571–1580, 2021.
  39. Learning to prompt for continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 139–149, 2022.
  40. Autoregressive visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9697–9706, 2023.
  41. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  42. Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 14561–14571, 2023.
  43. Object tracking benchmark. TPAMI, 37(9):1834–1848, 2015.
  44. Correlation-aware deep tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8751–8760, 2022.
  45. Joint group feature selection and discriminative filter learning for robust visual object tracking. In ICCV, pages 7950–7960, 2019.
  46. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In AAAI, pages 12549–12556, 2020.
  47. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10448–10457, 2021.
  48. Joint feature learning and relation modeling for tracking: A one-stream framework. In European Conference on Computer Vision, pages 341–357. Springer, 2022.
  49. High-performance discriminative tracking with transformers. In ICCV, pages 9856–9865, 2021.
  50. Ocean: Object-aware anchor-free tracking. In ECCV, 2020.
  51. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
  52. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9516–9526, 2023.
Citations (13)

Summary

We haven't generated a summary for this paper yet.