Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LoReTrack: Efficient and Accurate Low-Resolution Transformer Tracking (2405.17660v1)

Published 27 May 2024 in cs.CV

Abstract: High-performance Transformer trackers have shown excellent results, yet they often bear a heavy computational load. Observing that a smaller input can immediately and conveniently reduce computations without changing the model, an easy solution is to adopt the low-resolution input for efficient Transformer tracking. Albeit faster, this hurts tracking accuracy much due to information loss in low resolution tracking. In this paper, we aim to mitigate such information loss to boost the performance of the low-resolution Transformer tracking via dual knowledge distillation from a frozen high-resolution (but not a larger) Transformer tracker. The core lies in two simple yet effective distillation modules, comprising query-key-value knowledge distillation (QKV-KD) and discrimination knowledge distillation (Disc-KD), across resolutions. The former, from the global view, allows the low-resolution tracker to inherit the features and interactions from the high-resolution tracker, while the later, from the target-aware view, enhances the target-background distinguishing capacity via imitating discriminative regions from its high-resolution counterpart. With the dual knowledge distillation, our Low-Resolution Transformer Tracker (LoReTrack) enjoys not only high efficiency owing to reduced computation but also enhanced accuracy by distilling knowledge from the high-resolution tracker. In extensive experiments, LoReTrack with a 256x256 resolution consistently improves baseline with the same resolution, and shows competitive or even better results compared to 384x384 high-resolution Transformer tracker, while running 52% faster and saving 56% MACs. Moreover, LoReTrack is resolution-scalable. With a 128x128 resolution, it runs 25 fps on a CPU with 64.9%/46.4% SUC scores on LaSOT/LaSOText, surpassing all other CPU real-time trackers. Code will be released.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Efficient visual tracking with exemplar transformers. In WACV, 2023.
  2. Fear: Fast, efficient, accurate and robust visual tracker. In ECCV, 2022.
  3. Robust object modeling for visual tracking. In ICCV, 2023.
  4. Backbone is all your need: A simplified architecture for visual object tracking. In ECCV, 2022.
  5. Learning efficient object detection models with knowledge distillation. NIPS, 2017.
  6. Efficient visual tracking via hierarchical cross-attention transformer. In ECCV, 2022.
  7. Seqtrack: Sequence to sequence learning for visual object tracking. In CVPR, 2023.
  8. Transformer tracking. In CVPR, 2021.
  9. Mixformer: End-to-end tracking with iterative mixed attention. In CVPR, 2022.
  10. Mixformerv2: Efficient fully transformer tracking. In NeurIPS, 2023.
  11. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  12. Lasot: A high-quality large-scale single object tracking benchmark. IJCV, 129:439–461, 2021.
  13. Lasot: A high-quality benchmark for large-scale single object tracking. In CVPR, 2019.
  14. Aiatrack: Attention in attention for transformer visual tracking. In ECCV, 2022.
  15. Generalized relation modeling for transformer tracking. In CVPR, 2023.
  16. Learning target-aware representation for visual tracking via informative interactions. In IJCAI, 2022.
  17. Knowledge adaptation for efficient semantic segmentation. In CVPR, 2019.
  18. Distilling the knowledge in a neural network. arXiv, 2015.
  19. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. TPAMI, 43(5):1562–1577, 2011.
  20. Exploring lightweight hierarchical vision transformers for efficient visual tracking. In ICCV, 2023.
  21. Zoomtrack: Target-aware non-uniform resizing for efficient visual tracking. In NeurIPS, 2023.
  22. The eighth visual object tracking vot2020 challenge results. In ECCVW, 2020.
  23. Swintrack: A simple and strong baseline for transformer tracking. In NeurIPS, 2022.
  24. Knowledge distillation via the target-aware transformer. In CVPR, 2022.
  25. Microsoft coco: Common objects in context. In ECCV, 2014.
  26. Rethinking resolution in the context of efficient video recognition. NeurIPS, 2022.
  27. Transforming model prediction for tracking. In CVPR, 2022.
  28. A benchmark and simulator for uav tracking. In ECCV, 2016.
  29. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In ECCV, 2018.
  30. Pytorch: An imperative style, high-performance deep learning library. NeurIPS, 2019.
  31. Distilled siamese networks for visual tracking. TPAMI, 44(12):8896–8909, 2021.
  32. Efficientnet: Rethinking model scaling for convolutional neural networks. In ICML, 2019.
  33. Attention is all you need. In NIPS, 2017.
  34. Transformer meets tracker: Exploiting temporal context for robust visual tracking. In CVPR, 2021.
  35. Minilm: Deep self-attention distillation for task-agnostic compression of pre-trained transformers. NeurIPS, 2020.
  36. Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition. NeurIPS, 2021.
  37. Studying very low resolution recognition using deep networks. In CVPR, 2016.
  38. Autoregressive visual tracking. In CVPR, 2023.
  39. Tinyclip: Clip distillation via affinity mimicking and weight inheritance. In ICCV, 2023.
  40. Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks. In CVPR, 2023.
  41. Correlation-aware deep tracking. In CVPR, 2022.
  42. Learning spatio-temporal transformer for visual tracking. In ICCV, 2021.
  43. Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In CVPR, 2021.
  44. Joint feature learning and relation modeling for tracking: A one-stream framework. In ECCV, 2022.
  45. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. In ICLR, 2017.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com