In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking (2402.17098v1)
Abstract: Deep learning-based methods monopolize the latest research in the field of thermal infrared (TIR) object tracking. However, relying solely on deep learning models to obtain better tracking results requires carefully selecting feature information that is beneficial to representing the target object and designing a reasonable template update strategy, which undoubtedly increases the difficulty of model design. Thus, recent TIR tracking methods face many challenges in complex scenarios. This paper introduces a novel Deep Bayesian Filtering (DBF) method to enhance TIR tracking in these challenging situations. DBF is distinctive in its dual-model structure: the system and observation models. The system model leverages motion data to estimate the potential positions of the target object based on two-dimensional Brownian motion, thus generating a prior probability. Following this, the observation model comes into play upon capturing the TIR image. It serves as a classifier and employs infrared information to ascertain the likelihood of these estimated positions, creating a likelihood probability. According to the guidance of the two models, the position of the target object can be determined, and the template can be dynamically updated. Experimental analysis across several benchmark datasets reveals that DBF achieves competitive performance, surpassing most existing TIR tracking methods in complex scenarios.
- M. Felsberg, A. Berg, G. Hager, J. Ahlberg, M. Kristan, J. Matas, A. Leonardis, L. Cehovin, G. Fernandez, T. Vojir et al., “The thermal infrared visual object tracking vot-tir2015 challenge results,” in International Conference on Computer Vision Workshops (ICCVW), 2015, pp. 76–88.
- D. Yuan, X. Shu, and Q. Liu, “Recent advances on thermal infrared target tracking: A survey,” in Asian Conference on Artificial Intelligence Technology (ACAIT). IEEE, 2022, pp. 1–6.
- J. A. Sobrino, F. Del Frate, M. Drusch, J. C. Jiménez-Muñoz, P. Manunta, and A. Regan, “Review of thermal infrared applications and requirements for future high-resolution sensors,” IEEE Transactions on Geoscience and Remote Sensing, vol. 54, no. 5, pp. 2963–2972, 2016.
- R. Gade and T. B. Moeslund, “Thermal cameras and applications: a survey,” Machine vision and applications, vol. 25, pp. 245–262, 2014.
- X. Zhang, P. Ye, H. Leung, K. Gong, and G. Xiao, “Object fusion tracking based on visible and infrared images: A comprehensive review,” Information Fusion, vol. 63, pp. 166–187, 2020.
- Y. Matsuo, Y. LeCun, M. Sahani, D. Precup, D. Silver, M. Sugiyama, E. Uchibe, and J. Morimoto, “Deep learning, reinforcement learning, and world models,” Neural Networks, vol. 152, pp. 267–275, 2022.
- G. Menghani, “Efficient deep learning: A survey on making deep learning models smaller, faster, and better,” ACM Computing Surveys, vol. 55, no. 12, pp. 1–37, 2023.
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2018, pp. 8971–8980.
- D. Zhang, C. P. Chen, T. Li, Y. Zuo, and N. Q. Duy, “Target tracking method of siamese networks based on the broad learning system,” CAAI Transactions on Intelligence Technology, vol. 8, no. 3, pp. 1043–1057, 2023.
- P. Gao, R. Yuan, F. Wang, L. Xiao, H. Fujita, and Y. Zhang, “Siamese attentional keypoint network for high performance visual tracking,” Knowledge-based systems, vol. 193, p. 105448, 2020.
- D. Xing, N. Evangeliou, A. Tsoukalas, and A. Tzes, “Siamese transformer pyramid networks for real-time uav tracking,” in IEEE/CVF Winter Conference on Applications of Computer Vision, 2022, pp. 2139–2148.
- P. Gao, X.-Y. Zhang, X.-L. Yang, F. Gao, H. Fujita, and F. Wang, “Robust visual tracking with extreme point graph-guided annotation: Approach and experiment,” Expert Systems with Applications, vol. 238, p. 122013, 2024.
- S. Javed, M. Danelljan, F. S. Khan, M. H. Khan, M. Felsberg, and J. Matas, “Visual object tracking with discriminative filters and siamese networks: a survey and outlook,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6552–6574, 2022.
- Q. Liu, X. Li, D. Yuan, C. Yang, X. Chang, and Z. He, “Lsotb-tir: A large-scale high-diversity thermal infrared single object tracking benchmark,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
- Q. Liu, Z. He, X. Li, and Y. Zheng, “Ptb-tir: A thermal infrared pedestrian tracking benchmark,” IEEE Transactions on Multimedia, vol. 22, no. 3, pp. 666–675, 2019.
- M. Ondrašovič and P. Tarábek, “Siamese visual object tracking: A survey,” IEEE Access, vol. 9, pp. 110 149–110 172, 2021.
- L. Bertinetto, J. Valmadre, J. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in European Conference on Computer Vision (ECCV). Springer-Verlag, 2016, pp. 850–865.
- X. Dong and J. Shen, “Triplet loss in siamese network for object tracking,” in European conference on computer vision (ECCV), 2018, pp. 459–474.
- P. Gao, Y. Ma, K. Song, C. Li, F. Wang, L. Xiao, and Y. Zhang, “High performance visual tracking with circular and structural operators,” Knowledge-Based Systems, vol. 161, pp. 240–253, 2018.
- M. Danelljan, G. Bhat, S. F. Khan, and M. Felsberg, “Eco: Efficient convolution operators for tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017, pp. 6931–6939.
- P. Gao, Q. Zhang, F. Wang, L. Xiao, H. Fujita, and Y. Zhang, “Learning reinforced attentional representation for end-to-end visual tracking,” Information Sciences, vol. 517, pp. 52–67, 2020.
- Q. Liu, X. Li, Z. He, N. Fan, D. Yuan, and H. Wang, “Learning deep multi-level similarity for thermal infrared object tracking,” IEEE Transactions on Multimedia, vol. 23, pp. 2114–2126, 2020.
- C. Yang, Q. Liu, G. Li, H. Pan, and Z. He, “Learning diverse fine-grained features for thermal infrared tracking,” Expert Systems with Applications, vol. 238, p. 121577, 2024.
- Q. Liu, D. Yuan, N. Fan, P. Gao, X. Li, and Z. He, “Learning dual-level deep representation for thermal infrared tracking,” IEEE Transactions on Multimedia, vol. 25, pp. 1269–1281, 2022.
- P. Gao, Y. Ma, K. Song, C. Li, F. Wang, and L. Xiao, “Large margin structured convolution operator for thermal infrared object tracking,” in IEEE International Conference on Pattern Recognition (ICPR). IEEE, 2018, pp. 2380–2385.
- P. Stano, Z. Lendek, J. Braaksma, R. Babuška, C. de Keizer, and A. J. den Dekker, “Parametric bayesian filters for nonlinear stochastic dynamical systems: A survey,” IEEE transactions on cybernetics, vol. 43, no. 6, pp. 1607–1624, 2013.
- M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, “A tutorial on particle filters for online nonlinear/non-gaussian bayesian tracking,” IEEE Transactions on Signal Processing, vol. 50, no. 2, pp. 174–188, 2002.
- S. Fang, H. Li, M. Yang, and Z. Wang, “Inertial navigation system based vehicle temporal relative localization with split covariance intersection filter,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5270–5277, 2022.
- H. Huang, J. Tang, C. Liu, B. Zhang, and B. Wang, “Variational bayesian-based filter for inaccurate input in underwater navigation,” IEEE Transactions on Vehicular Technology, vol. 70, no. 9, pp. 8441–8452, 2021.
- S. Choe, H. Seong, and E. Kim, “Indoor place category recognition for a cleaning robot by fusing a probabilistic approach and deep learning,” IEEE Transactions on Cybernetics, vol. 52, no. 8, pp. 7265–7276, 2021.
- O. Dagan and N. R. Ahmed, “Conservative filtering for heterogeneous decentralized data fusion in dynamic robotic systems,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022, pp. 5840–5847.
- Y. Petetin, Y. Janati, and F. Desbouvries, “Structured variational bayesian inference for gaussian state-space models with regime switching,” IEEE Signal Processing Letters, vol. 28, pp. 1953–1957, 2021.
- L. Martino and V. Elvira, “Compressed monte carlo with application in particle filtering,” Information Sciences, vol. 553, pp. 331–352, 2021.
- P. G. Bhat, B. N. Subudhi, T. Veerakumar, V. Laxmi, and M. S. Gaur, “Multi-feature fusion in particle filter framework for visual tracking,” IEEE Sensors Journal, vol. 20, no. 5, pp. 2405–2415, 2019.
- Y. Cao, G. Shi, T. Zhang, W. Dong, J. Wu, X. Xie, and X. Li, “Bayesian correlation filter learning with gaussian scale mixture model for visual tracking,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 5, pp. 3085–3098, 2021.
- S. Iyengar, “Hitting lines with two-dimensional brownian motion,” SIAM Journal on Applied Mathematics, vol. 45, no. 6, pp. 983–989, 1985.
- H. Nam and B. Han, “Learning multi-domain convolutional neural networks for visual tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2016, pp. 4293–4302.
- S. M. Marvasti-Zadeh, L. Cheng, H. Ghanei-Yakhdan, and S. Kasaei, “Deep learning for visual tracking: A comprehensive survey,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 5, pp. 3943–3968, 2022.
- N. Wang and D.-Y. Yeung, “Learning a deep compact image representation for visual tracking,” in Advances in Neural Information Processing Systems (NeurIPS), 2013, pp. 809–817.
- D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in European Conference on Computer Vision (ECCV). Springer, 2016, pp. 749–765.
- P. Gao, Y. Ma, C. Li, K. Song, Y. Zhang, F. Wang, and L. Xiao, “Adaptive object tracking with complementary models,” IEICE TRANSACTIONS on Information and Systems, vol. 101, no. 11, pp. 2849–2854, 2018.
- Y. Ma, C. Yuan, P. Gao, and F. Wang, “Efficient multi-level correlating for visual tracking,” in Asian Conference on Computer Vision (ACCV). Springer-Verlag, 2018, pp. 452–465.
- P. Gao, Y. Ma, C. Li, K. Song, F. Wang, and L. Xiao, “A complementary tracking model with multiple features,” in International Conference on Image, Video Processing and Artificial Intelligence (IVPAI), ser. Proceedings of SPIE 10836. SPIE, 2018, pp. 248–252.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F.-F. Li, “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision (ECCV). Springer-Verlag, 2014, pp. 740–755.
- M. Muller, A. Bibi, S. Giancola, S. Alsubaihi, and B. Ghanem, “Trackingnet: A large-scale dataset and benchmark for object tracking in the wild,” in European conference on computer vision (ECCV). Springer, 2018, pp. 300–317.
- Y. Wu, J. Lim, and M.-H. Yang, “Object tracking benchmark,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 9, pp. 1834–1848, 2015.
- L. Huang, X. Zhao, and K. Huang, “Got-10k: A large high-diversity benchmark for generic object tracking in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2019.
- H. Fan, H. Bai, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, M. Huang, J. Liu, Y. Xu et al., “Lasot: A high-quality large-scale single object tracking benchmark,” International Journal of Computer Vision, vol. 129, no. 2, pp. 439–461, 2021.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- Q. Wu, T. Yang, Z. Liu, B. Wu, Y. Shan, and A. B. Chan, “Dropmae: Masked autoencoders with spatial-attention dropout for tracking tasks,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 14 561–14 571.
- B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in European Conference on Computer Vision. Springer, 2022, pp. 341–357.
- X. Chen, H. Peng, D. Wang, H. Lu, and H. Hu, “Seqtrack: Sequence to sequence learning for visual object tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023, pp. 14 572–14 581.
- Z. Yang, Y. Wei, and Y. Yang, “Collaborative video object segmentation by multi-scale foreground-background integration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 4701–4712, 2021.
- ——, “Associating objects with transformers for video object segmentation,” Advances in Neural Information Processing Systems, vol. 34, pp. 2491–2502, 2021.
- Z. Yang and Y. Yang, “Decoupling features in hierarchical propagation for video object segmentation,” Advances in Neural Information Processing Systems, vol. 35, pp. 36 324–36 336, 2022.
- Q. Liu, X. Lu, Z. He, C. Zhang, and W.-S. Chen, “Deep convolutional neural networks for thermal infrared object tracking,” Knowledge-Based Systems, vol. 134, pp. 189–198, 2017.
- M. Schuster and K. K. Paliwal, “Bidirectional recurrent neural networks,” IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673–2681, 1997.
- Y. Cai, X. Sui, G. Gu, and Q. Chen, “Learning modality feature fusion via transformer for rgbt-tracking,” Infrared Physics & Technology, vol. 133, p. 104819, 2023.
- Y. Huang, X. Li, R. Lu, and N. Qi, “Rgb-t object tracking via sparse response-consistency discriminative correlation filters,” Infrared Physics & Technology, vol. 128, p. 104509, 2023.
- J. Qiu, R. Yao, Y. Zhou, P. Wang, Y. Zhang, and H. Zhu, “Visible and infrared object tracking via convolution-transformer network with joint multimodal feature learning,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023.
- H. Li, Y. Zha, H. Li, P. Zhang, and W. Huang, “Efficient thermal infrared tracking with cross-modal compress distillation,” Engineering Applications of Artificial Intelligence, vol. 123, p. 106360, 2023.
- W. Wang, Y. Sun, K. Li, J. Wang, C. He, and D. Sun, “Fully bayesian analysis of the relevance vector machine classification for imbalanced data problem,” CAAI Transactions on Intelligence Technology, vol. 8, no. 1, pp. 192–205, 2023.
- E. Stengård and R. Van den Berg, “Imperfect bayesian inference in visual perception,” PLoS computational biology, vol. 15, no. 4, p. e1006465, 2019.
- J. L. Puga, M. Krzywinski, and N. Altman, “Bayes’ theorem: Incorporate new evidence to update prior information,” Nature Methods, vol. 12, no. 4, pp. 277–279, 2015.
- R. E. Kalman et al., “A new approach to linear filtering and prediction problems,” Journal of basic Engineering, vol. 82, no. 1, pp. 35–45, 1960.
- A. Doucet, N. De Freitas, and N. Gordon, “An introduction to sequential monte carlo methods,” Sequential Monte Carlo methods in practice, pp. 3–14, 2001.
- A. Kutschireiter, S. C. Surace, H. Sprekeler, and J.-P. Pfister, “Nonlinear bayesian filtering and learning: a neuronal dynamics for perception,” Scientific reports, vol. 7, no. 1, p. 8722, 2017.
- E. Pei, Y. Zhao, M. C. Oveneke, D. Jiang, and H. Sahli, “A bayesian filtering framework for continuous affect recognition from facial images,” IEEE Transactions on Multimedia, 2022.
- S. Wang, J. Gao, B. Li, and W. Hu, “Narrowing the gap: Improved detector training with noisy location annotations,” IEEE Transactions on Image Processing, vol. 31, pp. 6369–6380, 2022.
- F. Alrowais, R. Marzouk, F. N. Al-Wesabi, and A. M. Hilal, “Hand gesture recognition for disabled people using bayesian optimization with transfer learning.” Intelligent Automation & Soft Computing, vol. 36, no. 3, 2023.
- M. A. Lee, B. Yi, R. Martín-Martín, S. Savarese, and J. Bohg, “Multimodal sensor fusion with differentiable filters,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020, pp. 10 444–10 451.
- W. Yu, Y. Shao, J. Xu, and C. Mechefske, “An adaptive and generalized wiener process model with a recursive filtering algorithm for remaining useful life estimation,” Reliability Engineering & System Safety, vol. 217, p. 108099, 2022.
- F. Su, J. Chao, P. Liu, B. Zhang, N. Zhang, Z. Luo, and J. Han, “Prognostic models for breast cancer: based on logistics regression and hybrid bayesian network,” BMC Medical Informatics and Decision Making, vol. 23, no. 1, p. 120, 2023.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556v6, 2015.
- P. Gao, Y. Ma, R. Yuan, L. Xiao, and F. Wang, “Learning cascaded siamese networks for high performance visual tracking,” in IEEE International Conference on Image Processing (ICIP). IEEE, 2019, pp. 3078–3082.
- A. Vedaldi and K. Lenc, “Matconvnet: Convolutional neural networks for matlab,” in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 689–692.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems (NeurIPS), 2012, pp. 1097–1105.
- J. Valmadre, L. Bertinetto, J. Henriques, A. Vedaldi, and P. H. S. Torr, “End-to-end representation learning for correlation filter based tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017, pp. 2805–2813.
- Q. Guo, W. Feng, C. Zhou, R. Huang, L. Wan, and S. Wang, “Learning dynamic siamese network for visual object tracking,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 1763–1771.
- L. Zhang, A. Gonzalez-Garcia, J. Van De Weijer, M. Danelljan, and F. S. Khan, “Synthetic data generation for end-to-end thermal infrared tracking,” IEEE Transactions on Image Processing, vol. 28, no. 4, pp. 1837–1850, 2018.
- X. Li, Q. Liu, N. Fan, Z. He, and H. Wang, “Hierarchical spatial-aware siamese network for thermal infrared object tracking,” Knowledge-Based Systems, vol. 166, pp. 71–81, 2019.
- M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, “Learning spatially regularized correlation filters for visual tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2015, pp. 4310–4318.
- X. Li, C. Ma, B. Wu, Z. He, and M.-H. Yang, “Target-aware deep tracking,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1369–1378.
- Y. Song, C. Ma, X. Wu, L. Gong, L. Bao, W. Zuo, C. Shen, R. W. Lau, and M.-H. Yang, “Vital: Visual tracking via adversarial learning,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8990–8999.
- N. Wang, Y. Song, C. Ma, W. Zhou, W. Liu, and H. Li, “Unsupervised deep tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1308–1317.
- X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021, pp. 8126–8135.
- Y. Cui, C. Jiang, L. Wang, and G. Wu, “Mixformer: End-to-end tracking with iterative mixed attention,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 13 608–13 618.
- Z. Song, J. Yu, Y.-P. P. Chen, and W. Yang, “Transformer tracking with cyclic shifting window attention,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8791–8800.