VTR: An Optimized Vision Transformer for SAR ATR Acceleration on FPGA (2404.04527v1)
Abstract: Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR) is a key technique used in military applications like remote-sensing image recognition. Vision Transformers (ViTs) are the current state-of-the-art in various computer vision applications, outperforming their CNN counterparts. However, using ViTs for SAR ATR applications is challenging due to (1) standard ViTs require extensive training data to generalize well due to their low locality; the standard SAR datasets, however, have a limited number of labeled training data which reduces the learning capability of ViTs; (2) ViTs have a high parameter count and are computation intensive which makes their deployment on resource-constrained SAR platforms difficult. In this work, we develop a lightweight ViT model that can be trained directly on small datasets without any pre-training by utilizing the Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) modules. We directly train this model on SAR datasets which have limited training samples to evaluate its effectiveness for SAR ATR applications. We evaluate our proposed model, that we call VTR (ViT for SAR ATR), on three widely used SAR datasets: MSTAR, SynthWakeSAR, and GBSAR. Further, we propose a novel FPGA accelerator for VTR, in order to enable deployment for real-time SAR ATR applications.
- Tsokas, A., Rysz, M., Pardalos, P. M., and Dipple, K., “Sar data applications in earth observation: An overview,” Expert Systems with Applications 205, 117342 (2022).
- Reigber, A., Scheiber, R., Jager, M., Prats-Iraola, P., Hajnsek, I., Jagdhuber, T., Papathanassiou, K. P., Nannini, M., Aguilera, E., Baumgartner, S., Horn, R., Nottensteiner, A., and Moreira, A., “Very-high-resolution airborne synthetic aperture radar imaging: Signal processing and applications,” Proceedings of the IEEE 101(3), 759–783 (2013).
- Li, J., Yu, Z., Yu, L., Cheng, P., Chen, J., and Chi, C., “A comprehensive survey on sar atr in deep-learning era,” Remote Sensing 15(5) (2023).
- Moreira, A., Prats-Iraola, P., Younis, M., Krieger, G., Hajnsek, I., and Papathanassiou, K. P., “A tutorial on synthetic aperture radar,” IEEE Geoscience and Remote Sensing Magazine 1(1), 6–43 (2013).
- Ding, J., Chen, B., Liu, H., and Huang, M., “Convolutional neural network with data augmentation for sar target recognition,” IEEE Geoscience and Remote Sensing Letters 13(3), 364–368 (2016).
- Chen, S., Wang, H., Xu, F., and Jin, Y.-Q., “Target classification using the deep convolutional networks for sar images,” IEEE Transactions on Geoscience and Remote Sensing 54(8), 4806–4817 (2016).
- Zhang, B., Wijeratne, S., Kannan, R., Prasanna, V., and Busart, C., “Graph neural network for accurate and low-complexity sar atr,” arXiv preprint arXiv:2305.07119 (2023).
- Wang, R., Wang, L., Wei, X., Chen, J.-W., and Jiao, L., “Dynamic graph-level neural network for sar image change detection,” IEEE Geoscience and Remote Sensing Letters 19, 1–5 (2022).
- Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., and Houlsby, N., “An image is worth 16x16 words: Transformers for image recognition at scale,” (2021).
- Khan, S., Naseer, M., Hayat, M., Zamir, S. W., Khan, F. S., and Shah, M., “Transformers in vision: A survey,” ACM Computing Surveys 54, 1–41 (Jan. 2022).
- Chen, L., Luo, R., Xing, J., Li, Z., Yuan, Z., and Cai, X., “Geospatial transformer is what you need for aircraft detection in sar imagery,” IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022).
- Liu, X., Wu, Y., Liang, W., Cao, Y., and Li, M., “High resolution sar image classification using global-local network structure based on vision transformer and cnn,” IEEE Geoscience and Remote Sensing Letters 19, 1–5 (2022).
- Wang, C., Huang, Y., Liu, X., Pei, J., Zhang, Y., and Yang, J., “Global in local: A convolutional transformer for sar atr fsl,” IEEE Geoscience and Remote Sensing Letters 19, 1–5 (2022).
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L., “Imagenet: A large-scale hierarchical image database,” in [2009 IEEE Conference on Computer Vision and Pattern Recognition ], 248–255 (2009).
- Dong, H., Zhang, L., and Zou, B., “Exploring vision transformers for polarimetric sar image classification,” IEEE Transactions on Geoscience and Remote Sensing 60, 1–15 (2022).
- Zhou, Y., Jiang, X., Xu, G., Yang, X., Liu, X., and Li, Z., “Pvt-sar: An arbitrarily oriented sar ship detector with pyramid vision transformer,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 16, 291–305 (2023).
- Zhang, B., Kannan, R., Prasanna, V., and Busart, C., “Accurate, low-latency, efficient sar automatic target recognition on fpga,” in [2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) ], 1–8 (2022).
- Zhang, B., Kannan, R., Prasanna, V., and Busart, C., “Accelerating gnn-based sar automatic target recognition on hbm-enabled fpga,” in [2023 IEEE High Performance Extreme Computing Conference (HPEC) ], 1–7 (2023).
- Fein-Ashley, J., Ye, T., Wickramasinghe, S., Zhang, B., Kannan, R., and Prasanna, V., “A single graph convolution is all you need: Efficient grayscale image classification,” arXiv preprint arXiv:2402.00564 (2024).
- Lee, S. H., Lee, S., and Song, B. C., “Vision transformer for small-size datasets,” arXiv preprint arXiv:2112.13492 (2021).
- “MSTAR dataset.” https://www.sdms.afrl.af.mil/index.php?collection=mstar. Accessed: 2024-03-27.
- Rizaev, I. G. and Achim, A., “Synthwakesar: A synthetic sar dataset for deep learning classification of ships at sea,” Remote Sensing 14(16), 3999 (2022).
- Turčinović, F., Kačan, M., Bojanjac, D., and Bosiljevac, M., “Deep learning approach based on gbsar data for detection of defects in packed objects,” in [2023 17th European Conference on Antennas and Propagation (EuCAP) ], 1–4 (2023).
- Fein-Ashley, J., Ye, T., Kannan, R., Prasanna, V., and Busart, C., “Benchmarking deep learning classifiers for sar automatic target recognition,” in [2023 IEEE High Performance Extreme Computing Conference (HPEC) ], 1–6, IEEE (2023).
- Morgan, D. A., “Deep convolutional neural networks for atr from sar imagery,” in [Algorithms for Synthetic Aperture Radar Imagery XXII ], 9475, 116–128, SPIE (2015).
- Li, S., Lang, P., Fu, X., Jiang, J., Dong, J., and Nie, Z., “Automatic target recognition of sar images based on transformer,” in [2021 CIE International Conference on Radar (Radar) ], 938–941, IEEE (2021).
- He, Y.-L., Zhang, X.-L., Ao, W., and Huang, J. Z., “Determining the optimal temperature parameter for softmax function in reinforcement learning,” Applied Soft Computing 70, 80–85 (2018).
- Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S., “Pytorch: An imperative style, high-performance deep learning library,” in [Advances in Neural Information Processing Systems ], Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R., eds., 32, Curran Associates, Inc. (2019).
- Kingma, D. P. and Ba, J., “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).
- He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [Proceedings of the IEEE conference on computer vision and pattern recognition ], 770–778 (2016).