Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Beyond SOT: Tracking Multiple Generic Objects at Once (2212.11920v3)

Published 22 Dec 2022 in cs.CV

Abstract: Generic Object Tracking (GOT) is the problem of tracking target objects, specified by bounding boxes in the first frame of a video. While the task has received much attention in the last decades, researchers have almost exclusively focused on the single object setting. Multi-object GOT benefits from a wider applicability, rendering it more attractive in real-world applications. We attribute the lack of research interest into this problem to the absence of suitable benchmarks. In this work, we introduce a new large-scale GOT benchmark, LaGOT, containing multiple annotated target objects per sequence. Our benchmark allows users to tackle key remaining challenges in GOT, aiming to increase robustness and reduce computation through joint tracking of multiple objects simultaneously. In addition, we propose a transformer-based GOT tracker baseline capable of joint processing of multiple objects through shared computation. Our approach achieves a 4x faster run-time in case of 10 concurrent objects compared to tracking each object independently and outperforms existing single object trackers on our new benchmark. In addition, our approach achieves highly competitive results on single-object GOT datasets, setting a new state of the art on TrackingNet with a success rate AUC of 84.4%. Our benchmark, code, and trained models will be made publicly available.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (72)
  1. Gmot-40: A benchmark for generic multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 6719–6728, June 2021.
  2. Fully-convolutional siamese networks for object tracking. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), October 2016.
  3. Simple online and realtime tracking. In 2016 IEEE international conference on image processing (ICIP), pages 3464–3468. IEEE, 2016.
  4. Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  5. Visual object tracking using adaptive correlation filters. In CVPR, 2010.
  6. Argoverse: 3d tracking and forecasting with rich maps. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  7. Transformer tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2021.
  8. Mixformer: End-to-end tracking with iterative mixed attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 13608–13618, June 2022.
  9. High-performance long-term tracking with meta-updater. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  10. ATOM: Accurate tracking by overlap maximization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  11. PyTracking: Visual tracking library based on PyTorch. https://github.com/visionml/pytracking, 2019. Accessed: 1/07/2022.
  12. ECO: efficient convolution operators for tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2017.
  13. Probabilistic regression for visual tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  14. Tao: A large-scale benchmark for tracking any object. In Proceedings of the European Conference on Computer Vision (ECCV), pages 436–454. Springer International Publishing, 2020.
  15. Motchallenge: A benchmark for single-camera multiple target tracking. International Journal of Computer Vision (IJCV), 129(4):1–37, 2020.
  16. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
  17. Lasot: A high-quality large-scale single object tracking benchmark. International Journal of Computer Vision (IJCV), 129(2):439–461, 2021.
  18. Lasot: A high-quality benchmark for large-scale single object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  19. Pets2009: Dataset and challenge. In 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pages 1–6, 2009.
  20. Need for speed: A benchmark for higher frame rate object tracking. In ICCV, 2017.
  21. Aiatrack: Attention in attention for transformer visual tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 146–164, 2022.
  22. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231–1237, 2013.
  23. Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  24. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.
  25. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(3):583–596, 2015.
  26. Globaltrack: A simple and strong baseline for long-term tracking. In Proceedings of the Conference on Artificial Intelligence (AAAI), February 2020.
  27. Got-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 43(5):1562–1577, 2021.
  28. Tracking multiple deformable objects in egocentric videos. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, Canada, 2023.
  29. The eighth visual object tracking vot2020 challenge results. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), August 2020.
  30. The sixth visual object tracking vot2018 challenge results. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW), September 2018.
  31. A novel performance evaluation methodology for single-target trackers. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 38(11):2137–2155, 2016.
  32. Efficient video annotation with visual interpolation and frame selection guidance. In IEEE Winter Conference on Applications of Computer Vision, WACV 2021, Waikoloa, HI, USA, January 3-8, 2021, 2021.
  33. Siamrpn++: Evolution of siamese visual tracking with very deep networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  34. High performance visual tracking with siamese region proposal network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
  35. Ovtrack: Open-vocabulary multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5567–5577, 2023.
  36. Swintrack: A simple and strong baseline for transformer tracking. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
  37. Microsoft COCO: common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), 2014.
  38. Opening up open world tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 19045–19055, June 2022.
  39. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10012–10022, October 2021.
  40. Decoupled weight decay regularization. In Proceedings of the International Conference on Learning Representations (ICLR), 2019.
  41. HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision (IJCV), 129(2):548–578, 2021.
  42. Now you see me: evaluating performance in long-term visual tracking, 2018.
  43. Unified transformer tracker for object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8781–8790, June 2022.
  44. Transforming model prediction for tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8731–8740, June 2022.
  45. Learning target candidate association to keep track of what not to track. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13444–13454, October 2021.
  46. A benchmark and simulator for uav tracking. In Proceedings of the European Conference on Computer Vision (ECCV), October 2016.
  47. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild. In Proceedings of the European Conference on Computer Vision (ECCV), 2018.
  48. Avist: A benchmark for visual object tracking in adverse visibility. In 33rd British Machine Vision Conference BMVC, 2022.
  49. Quasi-dense similarity learning for multiple object tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 164–173, June 2021.
  50. Robust visual tracking by segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 571–588, 2022.
  51. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  52. The 2017 DAVIS challenge on video object segmentation. CoRR, abs/1704.00675, 2017.
  53. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  54. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015.
  55. Multiple people tracking by lifted multicut and person re-identification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
  56. FCOS: Fully convolutional one-stage object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  57. Long-term tracking in the wild: a benchmark. In Proceedings of the European Conference on Computer Vision (ECCV), September 2018.
  58. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
  59. Siam R-CNN: Visual tracking by re-detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  60. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 37(9):1834–1848, 2015.
  61. Youtube-vos: A large-scale video object segmentation benchmark, 2018.
  62. Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In Proceedings of the Conference on Artificial Intelligence (AAAI), February 2020.
  63. Towards grand unification of object tracking. In Proceedings of the European Conference on Computer Vision (ECCV), pages 733–751. Springer International Publishing, 2022.
  64. Learning spatio-temporal transformer for visual tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 10448–10457, October 2021.
  65. ’skimming-perusal’ tracking: A framework for real-time and robust long-term tracking. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019.
  66. Joint feature learning and relation modeling for tracking: A one-stream framework. In Proceedings of the European Conference on Computer Vision (ECCV), pages 341–357. Springer Nature Switzerland, 2022.
  67. High-performance discriminative tracking with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9856–9865, October 2021.
  68. Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
  69. Multiple target tracking using spatio-temporal markov chain monte carlo data association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2007.
  70. Global data association for multi-object tracking using network flows. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2008.
  71. Bytetrack: Multi-object tracking by associating every detection box. In Proceedings of the European Conference on Computer Vision (ECCV), pages 1–21, 2022.
  72. Global tracking via ensemble of local trackers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8761–8770, June 2022.
Citations (4)

Summary

We haven't generated a summary for this paper yet.