SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation (2307.01024v2)
Abstract: Domain adaptation (DA) has demonstrated significant promise for real-time nighttime unmanned aerial vehicle (UAV) tracking. However, the state-of-the-art (SOTA) DA still lacks the potential object with accurate pixel-level location and boundary to generate the high-quality target domain training sample. This key issue constrains the transfer learning of the real-time daytime SOTA trackers for challenging nighttime UAV tracking. Recently, the notable Segment Anything Model (SAM) has achieved a remarkable zero-shot generalization ability to discover abundant potential objects due to its huge data-driven training approach. To solve the aforementioned issue, this work proposes a novel SAM-powered DA framework for real-time nighttime UAV tracking, i.e., SAM-DA. Specifically, an innovative SAM-powered target domain training sample swelling is designed to determine enormous high-quality target domain training samples from every single raw nighttime image. This novel one-to-many generation significantly expands the high-quality target domain training sample for DA. Comprehensive experiments on extensive nighttime UAV videos prove the robustness and domain adaptability of SAM-DA for nighttime UAV tracking. Especially, compared to the SOTA DA, SAM-DA can achieve better performance with fewer raw nighttime images, i.e., the fewer-better training. This economized training approach facilitates the quick validation and deployment of algorithms for UAVs. The code is available at https://github.com/vision4robotics/SAM-DA.
- S. Xuan, S. Li, M. Han, X. Wan, and G.-S. Xia, “Object Tracking in Satellite Videos by Improved Correlation Filters With Motion Estimations,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 1074–1086, 2020.
- J. Shao, B. Du, C. Wu, and L. Zhang, “Tracking Objects From Satellite Videos: A Velocity Feature Based Correlation Filter,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7860–7871, 2019.
- L. A. Varga, B. Kiefer, M. Messmer, and A. Zell, “SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2260–2270.
- Y. Li, C. Fu, F. Ding, Z. Huang, and G. Lu, “AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 920–11 929.
- H. Fan, H. Bai, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, M. Huang, J. Liu, Y. Xu et al., “LaSOT: A High-Quality Large-Scale Single Object Tracking Benchmark,” International Journal of Computer Vision, vol. 129, pp. 439–461, 2021.
- L. Huang, X. Zhao, and K. Huang, “GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2021.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
- E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, “YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7464–7473.
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
- Z. Cao, C. Fu, J. Ye, B. Li, and Y. Li, “HiFT: Hierarchical Feature Transformer for Aerial Tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 457–15 466.
- D. Guo, Y. Shao, Y. Cui, Z. Wang, L. Zhang, and C. Shen, “Graph Attention Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9538–9547.
- Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast Online Object Tracking and Segmentation: A Unifying Approach,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1328–1338.
- H. Zuo, C. Fu, S. Li, J. Ye, and G. Zheng, “DeconNet: End-to-End Decontaminated Network for Vision-Based Aerial Tracking,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.
- J. Ye, C. Fu, Z. Cao, S. An, G. Zheng, and B. Li, “Tracker Meets Night: A Transformer Enhancer for UAV Tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3866–3873, 2022.
- J. Ye, C. Fu, G. Zheng, Z. Cao, and B. Li, “DarkLighter: Light Up the Darkness for UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3079–3085.
- J. Ye, C. Fu, G. Zheng, D. P. Paudel, and G. Chen, “Unsupervised Domain Adaptation for Nighttime Aerial Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 8886–8895.
- M. Zhang, J. Liu, Y. Wang, Y. Piao, S. Yao, W. Ji, J. Li, H. Lu, and Z. Luo, “Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 1553–1563.
- W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, and R. Yang, “Salient Object Detection in the Deep Learning Era: An In-Depth Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3239–3259, 2022.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment Anything,” arXiv preprint arXiv:2304.02643, pp. 1–30, 2023.
- L. Tang, H. Xiao, and B. Li, “Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection,” arXiv preprint arXiv:2304.04709, pp. 1–6, 2023.
- T. Yu, R. Feng, R. Feng, J. Liu, X. Jin, W. Zeng, and Z. Chen, “Inpaint Anything: Segment Anything Meets Image Inpainting,” arXiv preprint arXiv:2304.06790, pp. 1–7, 2023.
- S. Roy, T. Wald, G. Koehler, M. R. Rokuss, N. Disch, J. Holzschuh, D. Zimmerer, and K. H. Maier-Hein, “SAM.MD: Zero-Shot Medical Image Segmentation Capabilities of the Segment Anything Model,” arXiv preprint arXiv:2304.05396, pp. 1–4, 2023.
- Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese Box Adaptive Network for Visual Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6667–6676.
- B. Li, C. Fu, F. Ding, J. Ye, and F. Lin, “All-Day Object Tracking for Unmanned Aerial Vehicle,” IEEE Transactions on Mobile Computing, 2022.
- B. Li, C. Fu, F. Ding, J. Ye, and F. Lin, “ADTrack: Target-Aware Dual Filter Learning for Real-Time Anti-Dark UAV Tracking,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 496–502.
- C. Fu, H. Dong, J. Ye, G. Zheng, S. Li, and J. Zhao, “HighlightNet: Highlighting Low-Light Potential Features for Real-Time UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12 146–12 153.
- B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4277–4286.
- Z. Cao, C. Fu, J. Ye, B. Li, and Y. Li, “SiamAPN++: Siamese Attentional Aggregation Network for Real-Time UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3086–3092.
- H. Zuo, C. Fu, S. Li, J. Ye, and G. Zheng, “End-to-End Feature Decontaminated Network for UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12 130–12 137.
- C. Fu, M. Cai, S. Li, K. Lu, H. Zuo, and C. Liu, “Continuity-Aware Latent Interframe Information Mining for Reliable UAV Tracking,” arXiv preprint arXiv:2303.04525, 2023.
- L. Yao, C. Fu, S. Li, G. Zheng, and J. Ye, “SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking,” arXiv preprint arXiv:2303.04378, 2023.
- X. Wu, Z. Wu, H. Guo, L. Ju, and S. Wang, “DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15 764–15 773.
- Y. Sasagawa and H. Nagahara, “YOLO in the Dark - Domain Adaptation Method for Merging Multiple Models,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 345–359.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked Autoencoders are Scalable Vision Learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16 000–16 009.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proceedings of the International Conference on Learning Representations (ICLR), 2021, pp. 1–21.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 1–11.
- Y. Ganin and V. Lempitsky, “Unsupervised Domain Adaptation by Backpropagation,” in Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 1180–1189.
- X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least Squares Generative Adversarial Networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2794–2802.
- C. Fu, K. Lu, G. Zheng, J. Ye, Z. Cao, B. Li, and G. Lu, “Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis,” arXiv preprint arXiv:2205.04281, pp. 1–33, 2022.
- C. Liu, X.-F. Chen, C.-J. Bo, and D. Wang, “Long-term Visual Tracking: Review and Experimental Comparison,” Machine Intelligence Research, pp. 1–19, 2022.
- C. Fu, B. Li, F. Ding, F. Lin, and G. Lu, “Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation,” IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 1, pp. 125–160, 2021.
- Y. Xu, Z. Wang, Z. Li, Y. Yuan, and G. Yu, “SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020, pp. 12 549–12 556.
- Z. Zhang and H. Peng, “Deeper and Wider Siamese Networks for Real-Time Visual Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4586–4595.
- L. Zhang, A. Gonzalez-Garcia, J. V. D. Weijer, M. Danelljan, and F. S. Khan, “Learning the Model Update for Siamese Trackers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4009–4018.
- Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-Aware Anchor-Free Tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 771–787.
- C. Fu, Z. Cao, Y. Li, J. Ye, and C. Feng, “Siamese Anchor Proposal Network for High-Speed Aerial Tracking,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 510–516.
- C. Li, C. Guo, and C. C. Loy, “Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4225–4238, 2022.
- L. Van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, p. 2579–2605, 2008.
- M. Mueller, N. Smith, and B. Ghanem, “A Benchmark and Simulator for UAV Tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 445–461.
- C. Fu, Z. Cao, Y. Li, J. Ye, and C. Feng, “Onboard Real-Time Aerial Tracking With Efficient Siamese Anchor Proposal Network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.