Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Siamese Tracking with a Compact Latent Network (2302.00930v2)

Published 2 Feb 2023 in cs.CV

Abstract: In this paper, we provide an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification. Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training. Since the samples in the initial (first) frame contain rich sequence-specific information, we can regard them as the decisive samples to represent the whole sequence. To quickly adapt the base model to new scenes, a compact latent network is presented via fully using these decisive samples. Specifically, we present a statistics-based compact latent feature for fast adjustment by efficiently extracting the sequence-specific information. Furthermore, a new diverse sample mining strategy is designed for training to further improve the discrimination ability of the proposed compact latent network. Finally, a conditional updating strategy is proposed to efficiently update the basic models to handle scene variation during the tracking phase. To evaluate the generalization ability and effectiveness and of our method, we apply it to adjust three classical Siamese-based trackers, namely SiamRPN++, SiamFC, and SiamBAN. Extensive experimental results on six recent datasets demonstrate that all three adjusted trackers obtain the superior performance in terms of the accuracy, while having high running speed.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (93)
  1. X. Dong, J. Shen, L. Shao, and F. Porikli, “CLNet: A Compact Latent Network for Fast Adjusting Siamese Trackers,” in Proc. Eur. Conf. Comput. Vis., 2020, pp. 378–395.
  2. A. Yilmaz, O. Javed, and M. Shah, “Object tracking: A survey,” ACM Comput. Surv., vol. 38, no. 4, p. 13, 2006.
  3. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Neural Inf. Process. Syst. , 2012.
  4. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” Proc. Neural Inf. Process. Syst. , vol. 28, pp. 91–99, 2015.
  5. K. He, G. Gkioxari, P. Doll¨¢r, and R. B. Girshick, “Mask R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis., 2017.
  6. L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-Convolutional Siamese Networks for Object Tracking,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2016, pp. 850–865.
  7. B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High Performance Visual Tracking With Siamese Region Proposal Network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8971–8980.
  8. Z. Zhu, Q. Wang, B. Li, W. Wu, J. Yan, and W. Hu, “Distractor-aware siamese networks for visual object tracking,” in Proc. Eur. Conf. Comput. Vis. , 2018, pp. 101–117.
  9. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
  10. J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. Torr, “End-to-end representation learning for correlation filter based tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 2805–2813.
  11. H. Fan and H. Ling, “Siamese cascaded region proposal networks for real-time visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 7952–7961.
  12. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4282–4291.
  13. Z. Zhang and H. Peng, “Deeper and wider siamese networks for real-time visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 4591–4600.
  14. H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4293–4302.
  15. M. Danelljan, A. Robinson, F. S. Khan, and M. Felsberg, “Beyond correlation filters: Learning continuous convolution operators for visual tracking,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 472–488.
  16. S. Hong, T. You, S. Kwak, and B. Han, “Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network,” in Proc. Int. Conf. Mach. Learn., 2015, pp. 597–606.
  17. S. Khan, M. Hayat, S. W. Zamir, J. Shen, and L. Shao, “Striking the Right Balance With Uncertainty,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 103-112.
  18. Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese Box Adaptive Network for Visual Tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6668–6677.
  19. M. Kristan, A. Leonardis, J. Matas, M. Felsberg, R. Pflugfelder, J.-K. Kamarainen, L. Čehovin Zajc, M. Danelljan, A. Lukezic, O. Drbohlav, L. He, Y. Zhang, S. Yan, J. Yang, G. Fernandez, and et al., “The eighth visual object tracking vot2020 challenge results,” 2020.
  20. L. Huang, X. Zhao, and K. Huang, “GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no.5, pp. 1562–1577, 2019.
  21. J. F. Henriques, C. Rui, P. Martins, and J. Batista, “High-Speed Tracking with Kernelized Correlation Filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol 37, no. 3, pp. 583–596, 2014.
  22. M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg, “Eco: Efficient convolution operators for tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 6638–6646.
  23. X. Dong, J. Shen, D. Yu, W. Wang, J. Liu, and H. Huang, “Occlusion-Aware Real-Time Object Tracking,” IEEE Trans. Multimedia, vol. 19, no. 4, pp. 763–771, Apr. 2017.
  24. D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proc. Eur. Conf. Comput. Vis., 2016, pp. 749–765.
  25. X. Lu, C. Ma, B. Ni, X. Yang, I. Reid, and M.-H. Yang, “Deep regression tracking with shrinkage loss,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 353–369.
  26. B. Ma, J. Shen, Y. Liu, H. Hu, L. Shao, and X. Li, “Visual Tracking Using Strong Classifier and Structural Local Sparse Descriptors,” IEEE Trans. Multimedia, vol. 10, no. 17, pp. 1818–1828, 2015.
  27. B. Ma, H. Hu, J. Shen, Y. Zhang, and F. Porikli, “Linearization to nonlinear learning for visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4400–4407.
  28. G. E. Hinton and D. C. Plaut, “Using Fast Weights to Deblur Old Memories,” in Proc. Annual Conf. Cog. Sci. Society, 1987, pp. 177–186.
  29. J. Schmidhuber, “Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-… hook,” PhD Thesis, Technische Universität München, 1987.
  30. L. Wang, W. Ouyang, X. Wang, and H. Lu, “Visual Tracking with Fully Convolutional Networks,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 3119–3127.
  31. C. Ma, J.-B. Huang, X. Yang, and M.-H. Yang, “Hierarchical convolutional features for visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 3074–3082.
  32. R. Tao, E. Gavves, and A. W. Smeulders, “Siamese instance search for tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1420–1429.
  33. J. Shen, X. Tang, X. Dong, and L. Shao, “Visual object tracking by hierarchical attention siamese network,” IEEE Trans. Cybernetics, vol. 50, no. 7, pp. 3068–3080, 2019.
  34. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll¨¢r, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 740–755.
  35. E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, “Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 5296–5305.
  36. H. Fan, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, H. Bai, Y. Xu, C. Liao, and H. Ling, “Lasot: A high-quality benchmark for large-scale single object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5374–5383.
  37. L. Zhang, A. Gonzalez-Garcia, J. Van De Weijer, M. Danelljan, and F. S. Khan, “Synthetic data generation for end-to-end thermal infrared tracking,” IEEE Trans. Image Process. , vol. 28, no. 4, pp. 1837–1850, 2018.
  38. A. He, C. Luo, X. Tian, and W. Zeng, “A twofold siamese network for real-time object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4834–4843.
  39. Q. Wang, Z. Teng, J. Xing, J. Gao, W. Hu, and S. Maybank, “Learning attentions: residual attentional siamese network for high performance online visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4854–4863.
  40. Y. Zhang, L. Wang, J. Qi, D. Wang, M. Feng, and H. Lu, “Structured siamese network for real-time visual tracking,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 351–366.
  41. X. Dong, J. Shen, D. Wu, K. Guo, X. Jin, and F. Porikli, “Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking,” IEEE Trans. Image Process. , vol. 28, no. 7, pp. 3516–3527, Jul. 2019.
  42. X. Dong and J. Shen, “Triplet Loss in Siamese Network for Object Tracking,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 459–474.
  43. J. Shen, Y. Liu, X. Dong, X. Lu, F. S. Khan, and S. C. Hoi, “Distilled siamese networks for visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell., 2021, pp. 1–1.
  44. X. Wang, C. Li, B. Luo, and J. Tang, “Sint++: Robust visual tracking via adversarial positive instance generation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 4864–4873.
  45. T. Yang and A. B. Chan, “Learning dynamic memory networks for object tracking,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 152–167.
  46. Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, and S. Wang, “Learning dynamic siamese network for visual object tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1763–1771.
  47. C. Huang, S. Lucey, and D. Ramanan, “Learning policies for adaptive tracking with deep feature cascades,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, p. 105–114.
  48. X. Dong, J. Shen, W. Wang, Y. Liu, L. Shao, and F. Porikli, “Hyperparameter Optimization for Tracking With Continuous Deep Q-Learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 518–527.
  49. X. Dong, J. Shen, W. Wang, L. Shao, H. Ling, and F. Porikli, “Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43 no. 5, pp. 1515–1529, 2021.
  50. M. Kristan, J. Matas, A. Leonardis, T. Voj¨ª?, R. Pflugfelder, G. Fernandez, G. Nebehay, F. Porikli, and L. ?ehovin, “A novel performance evaluation methodology for single-target trackers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 11, pp. 2137-2155, 2016.
  51. J. Ba, G. E. Hinton, V. Mnih, J. Z. Leibo, and C. Ionescu, “Using Fast Weights to Attend to the Recent Past,” in Proc. Neural Inf. Process. Syst., D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett, Eds., vol. 29, 2016, pp. 4331–4339.
  52. S. Thrun and L. Pratt, “Learning to learn: Introduction and overview,” in Learning to learn, 1998, pp. 3–17.
  53. S. Hochreiter, A. S. Younger, and P. R. Conwell, “Learning to learn using gradient descent,” in Proc. Int. Conf. Art. Neural Net., 2001, pp. 87–94.
  54. M. Andrychowicz, M. Denil, S. G¨®mez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, and N. de Freitas, “Learning to learn by gradient descent by gradient descent,” Proc. Neural Inf. Process. Syst., vol. 29, pp. 3981–3989, 2016.
  55. G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image recognition,” in Proc. Int. Conf. Mach. Learn. deep learning workshop, 2015.
  56. O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching Networks for One Shot Learning,” Proc. Neural Inf. Process. Syst., vol. 29, pp. 3630–3638, 2016.
  57. J. Snell, K. Swersky, and R. Zemel, “Prototypical Networks for Few-shot Learning,” Proc. Neural Inf. Process. Syst., vol. 30, pp. 4077–4087, 2017.
  58. A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with memory-augmented neural networks,” in Proc. Int. Conf. Mach. Learn., 2016, pp. 1842–1850.
  59. S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” Proc. Int. Conf. Learn. Representations, 2017.
  60. C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. Int. Conf. Mach. Learn., 2017, pp. 1126–1135.
  61. C. Finn, K. Xu, and S. Levine, “Probabilistic model-agnostic meta-learning,” in Proc. Neural Inf. Process. Syst., 2018, pp. 9537–9548.
  62. A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell, “Meta-learning with latent embedding optimization,” Proc. Int. Conf. Learn. Representations, 2019.
  63. H. Li, W. Dong, X. Mei, C. Ma, F. Huang, and B.-G. Hu, “Lgm-net: Learning to generate matching networks for few-shot learning,” in Proc. Int. Conf. Mach. Learn., 2019, pp. 3825–3834.
  64. E. Park and A. C. Berg, “Meta-tracker: Fast and robust online adaptation for visual object trackers,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 569–585.
  65. J. Choi, J. Kwon, and K. M. Lee, “Deep meta learning for real-time target-aware visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 911–920.
  66. Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R. W. Lau, and M.-H. Yang, “Vital: Visual tracking via adversarial learning,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 8990–8999.
  67. P. Li, B. Chen, W. Ouyang, D. Wang, X. Yang, and H. Lu, “Gradnet: Gradient-guided network for visual object tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6162–6171.
  68. H. Kiani Galoogahi, A. Fagg, C. Huang, D. Ramanan, and S. Lucey, “Need for speed: A benchmark for higher frame rate object tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 1125–1134.
  69. S. Li and D.-Y. Yeung, “Visual object tracking for unmanned aerial vehicles: a benchmark and new motion models,” in Proc. Associ. Advance. Art. Intell., Feb. 2017, pp. 4140–4146.
  70. M. Kristan, J. Matas, A. Leonardis, M. Felsberg, R. Pflugfelder, J.-K. Kamarainen, L. Čehovin Zajc, O. Drbohlav, A. Lukezic, A. Berg, A. Eldesokey, J. Kapyla, and G. Fernandez, “The seventh visual object tracking vot2019 challenge results,” 2019.
  71. M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg, “Learning spatially regularized correlation filters for visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4310–4318.
  72. M. Danelljan, F. Shahbaz Khan, M. Felsberg, and J. Van de Weijer, “Adaptive color attributes for real-time visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2014, pp. 1090–1097.
  73. C. Ma, X. Yang, C. Zhang, and M.-H. Yang, “Long-term correlation tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2015, pp. 5388–5396.
  74. Y. Li and J. Zhu, “A Scale Adaptive Kernel Correlation Filter Tracker with Feature Integration.” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 254–265.
  75. Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and M.-H. Yang, “Hedged deep tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 4303–4311.
  76. Y. Song, C. Ma, L. Gong, J. Zhang, R. W. Lau, and M.-H. Yang, “Crest: Convolutional residual learning for visual tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 2555–2564.
  77. J. Zhang, S. Ma, and S. Sclaroff, “MEEM: robust tracking via multiple experts using entropy minimization,” in Proc. Eur. Conf. Comput. Vis., 2014, pp. 188–203.
  78. M. Danelljan, L. V. Gool, and R. Timofte, “Probabilistic regression for visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 7183–7192.
  79. P. Voigtlaender, J. Luiten, P. H. Torr, and B. Leibe, “Siam r-cnn: Visual tracking by re-detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6578–6588.
  80. Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-aware anchor-free tracking,” in Proc. Eur. Conf. Comput. Vis..   Springer, 2020, pp. 771–787.
  81. G. Wang, C. Luo, Z. Xiong, and W. Zeng, “Spm-tracker: Series-parallel matching for real-time visual object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 3643–3652.
  82. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast online object tracking and segmentation: A unifying approach,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2019, pp. 1328–1338.
  83. G. Bhat, M. Danelljan, L. V. Gool, and R. Timofte, “Learning discriminative model prediction for tracking,” in Proc. IEEE Int. Conf. Comput. Vis., 2019, pp. 6182–6191.
  84. Y. Wu, J. Lim, and M.-H. Yang, “Online object tracking: A benchmark,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2013, pp. 2411–2418.
  85. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proc. Eur. Conf. Comput. Vis., 2018, pp. 3–19.
  86. E. Perez, F. Strub, H. De Vries, V. Dumoulin, and A. Courville, “Film: Visual reasoning with a general conditioning layer,” in Proc. Associ. Advance. Art. Intell., vol. 32, no. 1, 2018.
  87. J. Choi, J. Kwon, and K. M. Lee, “Visual tracking by tridentalign and context embedding,” in Proc. Asian. Conf. Comput. Vis., 2020.
  88. X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 8126–8135.
  89. X. Chen, B. Yan, J. Zhu, D. Wang, and H. Lu, “High-performance transformer tracking,” arXiv preprint arXiv:2203.13533, 2022.
  90. B. Yan, H. Peng, J. Fu, D. Wang, and H. Lu, “Learning spatio-temporal transformer for visual tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 10 448–10 457.
  91. P. Blatter, M. Kanakis, M. Danelljan, and L. Van Gool, “Efficient visual tracking with exemplar transformers,” arXiv preprint arXiv:2112.09686, 2021.
  92. B. Yan, H. Peng, K. Wu, D. Wang, J. Fu, and H. Lu, “Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2021, pp. 15 180–15 189.
  93. V. Borsuk, R. Vei, O. Kupyn, T. Martyniuk, I. Krashenyi, and J. Matas, “Fear: Fast, efficient, accurate and robust visual tracker,” arXiv preprint arXiv:2112.07957, 2021.
Citations (25)

Summary

We haven't generated a summary for this paper yet.