SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking (2403.05852v1)
Abstract: Hyperspectral video (HSV) offers valuable spatial, spectral, and temporal information simultaneously, making it highly suitable for handling challenges such as background clutter and visual similarity in object tracking. However, existing methods primarily focus on band regrouping and rely on RGB trackers for feature extraction, resulting in limited exploration of spectral information and difficulties in achieving complementary representations of object features. In this paper, a spatial-spectral fusion network with spectral angle awareness (SST-Net) is proposed for hyperspectral (HS) object tracking. Firstly, to address the issue of insufficient spectral feature extraction in existing networks, a spatial-spectral feature backbone ($S2$FB) is designed. With the spatial and spectral extraction branch, a joint representation of texture and spectrum is obtained. Secondly, a spectral attention fusion module (SAFM) is presented to capture the intra- and inter-modality correlation to obtain the fused features from the HS and RGB modalities. It can incorporate the visual information into the HS spectral context to form a robust representation. Thirdly, to ensure a more accurate response of the tracker to the object position, a spectral angle awareness module (SAAM) investigates the region-level spectral similarity between the template and search images during the prediction stage. Furthermore, we develop a novel spectral angle awareness loss (SAAL) to offer guidance for the SAAM based on similar regions. Finally, to obtain the robust tracking results, a weighted prediction method is considered to combine the HS and RGB predicted motions of objects to leverage the strengths of each modality. Extensive experiments on the HOTC dataset demonstrate the effectiveness of the proposed SSF-Net, compared with state-of-the-art trackers.
- K.-H. Lee and J.-N. Hwang, “On-road pedestrian tracking across multiple driving recorders,” IEEE Trans. on Multimedia, vol. 17, no. 9, pp. 1429–1438, 2015.
- S. Chandra, G. Sharma, S. Malhotra, D. Jha, and A. P. Mittal, “Eye tracking based human computer interaction: Applications and their uses,” in 2015 International Conference on Man and Machine Interfacing (MAMI). IEEE, 2015, pp. 1–5.
- X. Tian, J. Liu, M. Mallick, and K. Huang, “Simultaneous detection and tracking of moving-target shadows in visar imagery,” IEEE Trans. on Geoscience and Remote Sensing, vol. 59, no. 2, pp. 1182–1199, 2020.
- J. He, Q. Yuan, J. Li, Y. Xiao, D. Liu, H. Shen, and L. Zhang, “Spectral super-resolution meets deep learning: Achievements and challenges,” Information Fusion, p. 101812, 2023.
- Y. Tang, Y. Liu, and H. Huang, “Target-aware and spatial-spectral discriminant feature joint correlation filters for hyperspectral video object tracking,” Computer Vision and Image Understanding, vol. 223, p. 103535, 2022.
- F. Xiong, J. Zhou, and Y. Qian, “Material based object tracking in hyperspectral videos,” IEEE Trans. on Image Processing, vol. 29, pp. 3719–3733, 2020.
- Z. Hou, W. Li, J. Zhou, and R. Tao, “Spatial–spectral weighted and regularized tensor sparse correlation filter for object tracking in hyperspectral videos,” IEEE Trans. on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.
- E. Ouyang, J. Wu, B. Li, L. Zhao, and W. Hu, “Band regrouping and response-level fusion for end-to-end hyperspectral object tracking,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.
- C. Sun, X. Wang, Z. Liu, Y. Wan, L. Zhang, and Y. Zhong, “Siamohot: A lightweight dual siamese network for onboard hyperspectral object tracking via joint spatial-spectral knowledge distillation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–12, 2023.
- L. Gao, P. Liu, Y. Jiang, W. Xie, J. Lei, Y. Li, and Q. Du, “Cbff-net: A new framework for efficient and accurate hyperspectral object tracking,” IEEE Trans. on Geoscience and Remote Sensing, vol. 61, pp. 1–14, 2023.
- Z. Li, F. Xiong, J. Zhou, J. Lu, and Y. Qian, “Learning a deep ensemble network with band importance for hyperspectral object tracking,” IEEE Trans. on Image Processing, vol. 32, pp. 2901–2914, 2023.
- Z. Liu, X. Wang, Y. Zhong, M. Shu, and C. Sun, “Siamhyper: Learning a hyperspectral object tracker from an rgb-based tracker,” IEEE Trans. on Image Processing, vol. 31, pp. 7116–7129, 2022.
- C. Zhao, H. Liu, N. Su, and Y. Yan, “Tftn: A transformer-based fusion tracking framework of hyperspectral and rgb,” IEEE Trans. on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
- H. Ye, H. Liu, F. Meng, and X. Li, “Bi-directional exponential angular triplet loss for rgb-infrared person re-identification,” IEEE Transactions on Image Processing, vol. 30, pp. 1583–1595, 2020.
- F. A. Kruse, A. Lefkoff, J. Boardman, K. Heidebrecht, A. Shapiro, P. Barloon, and A. Goetz, “The spectral image processing system (sips)—interactive visualization and analysis of imaging spectrometer data,” Remote Sensing of Environment, vol. 44, no. 2-3, pp. 145–163, 1993.
- L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr, “Fully-convolutional siamese networks for object tracking,” in European Conf. on Computer Vision. Springer, 2016, pp. 850–865.
- F. Xie, C. Wang, G. Wang, Y. Cao, W. Yang, and W. Zeng, “Correlation-aware deep tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8751–8760.
- M. Paul, M. Danelljan, C. Mayer, and L. Van Gool, “Robust visual tracking by segmentation,” in European Conf. on Computer Vision. Springer, 2022, pp. 571–588.
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High performance visual tracking with siamese region proposal network,” in Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 8971–8980.
- D. Guo, Y. Shao, Y. Cui, Z. Wang, L. Zhang, and C. Shen, “Graph attention tracking,” in Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 9543–9552.
- Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese box adaptive network for visual tracking,” in Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2020, pp. 6668–6677.
- N. Wang, W. Zhou, J. Wang, and H. Li, “Transformer meets tracker: Exploiting temporal context for robust visual tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 1571–1580.
- P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, “Transtrack: Multiple object tracking with transformer,” arXiv preprint arXiv:2012.15460, 2020.
- B. Ye, H. Chang, B. Ma, S. Shan, and X. Chen, “Joint feature learning and relation modeling for tracking: A one-stream framework,” in European Conf. on Computer Vision. Springer, 2022, pp. 341–357.
- J. Zhu, S. Lai, X. Chen, D. Wang, and H. Lu, “Visual prompt multi-modal tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 9516–9526.
- H. Van Nguyen, A. Banerjee, and R. Chellappa, “Tracking via object reflectance using a hyperspectral video camera,” in 2010 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition-Workshops. IEEE, 2010, pp. 44–51.
- B. Uzkent, A. Rangnekar, and M. J. Hoffman, “Tracking in aerial hyperspectral videos using deep kernelized correlation filters,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 449–461, 2018.
- Z. Li, F. Xiong, J. Zhou, J. Wang, J. Lu, and Y. Qian, “Bae-net: A band attention aware ensemble network for hyperspectral object tracking,” in 2020 IEEE International Conf. on Image Processing (ICIP). IEEE, 2020, pp. 2106–2110.
- W. Li, Z. Hou, J. Zhou, and R. Tao, “Siambag: Band attention grouping-based siamese object tracking network for hyperspectral videos,” IEEE Trans. on Geoscience and Remote Sensing, 2023.
- Y. Chen, Q. Yuan, Y. Tang, Y. Xiao, J. He, and L. Zhang, “Spirit: Spectral awareness interaction network with dynamic template for hyperspectral object tracking,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- M. A. Islam, J. Zhou, W. Zhang, and Y. Gao, “Background-aware band selection for object tracking in hyperspectral videos,” IEEE Geoscience and Remote Sensing Letters, 2023.
- L. Zhang, M. Danelljan, A. Gonzalez-Garcia, J. Van De Weijer, and F. Shahbaz Khan, “Multi-modal fusion for end-to-end rgb-t tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, 2019, pp. 0–0.
- P. Zhang, J. Zhao, C. Bo, D. Wang, H. Lu, and X. Yang, “Jointly modeling motion and appearance cues for robust rgb-t tracking,” IEEE Transactions on Image Processing, vol. 30, pp. 3335–3347, 2021.
- X. Lan, W. Zhang, S. Zhang, D. K. Jain, and H. Zhou, “Robust multi-modality anchor graph-based label prediction for rgb-infrared tracking,” IEEE Transactions on Industrial Informatics, 2019.
- Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, “Eca-net: Efficient channel attention for deep convolutional neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 11 534–11 542.
- B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “Siamrpn++: Evolution of siamese visual tracking with very deep networks,” in Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 4282–4291.
- C.-I. Chang, “Hyperspectral target detection: Hypothesis testing, signal-to-noise ratio, and spectral angle theories,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–23, 2021.
- J. Zeng and Q. Wang, “Sparse tensor model-based spectral angle detector for hyperspectral target detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
- A. Hermans, L. Beyer, and B. Leibe, “In defense of the triplet loss for person re-identification,” arXiv preprint arXiv:1703.07737, 2017.
- J. Yu, Y. Jiang, Z. Wang, Z. Cao, and T. Huang, “Unitbox: An advanced object detection network,” in Proceedings of the 24th ACM International Conf. on Multimedia, 2016, pp. 516–520.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conf. on Computer Vision. Springer, 2014, pp. 740–755.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “Imagenet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
- E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, “Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video,” in proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 5296–5305.
- Z. Li, X. Ye, F. Xiong, J. Lu, J. Zhou, and Y. Qian, “Spectral-spatial-temporal attention network for hyperspectral tracking,” in 2021 11th Workshop on Hyperspectral Imaging and Signal Processing: Evolution in Remote Sensing (WHISPERS). IEEE, 2021, pp. 1–5.
- X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, “Transformer tracking,” in Proceedings of the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 8126–8135.
- B. Chen, P. Li, L. Bai, L. Qiao, Q. Shen, B. Li, W. Gan, W. Wu, and W. Ouyang, “Backbone is all your need: A simplified architecture for visual object tracking,” in European Conf. on Computer Vision. Springer, 2022, pp. 375–392.
- Z. Chen, V. Badrinarayanan, C.-Y. Lee, and A. Rabinovich, “Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks,” in International Conference on Machine Learning. PMLR, 2018, pp. 794–803.
- J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 7132–7141.
- M. Narayanan, “Senetv2: Aggregated dense layer for channelwise and global representations,” arXiv preprint arXiv:2311.10807, 2023.