Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
153 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SAM-DA: UAV Tracks Anything at Night with SAM-Powered Domain Adaptation (2307.01024v2)

Published 3 Jul 2023 in cs.CV

Abstract: Domain adaptation (DA) has demonstrated significant promise for real-time nighttime unmanned aerial vehicle (UAV) tracking. However, the state-of-the-art (SOTA) DA still lacks the potential object with accurate pixel-level location and boundary to generate the high-quality target domain training sample. This key issue constrains the transfer learning of the real-time daytime SOTA trackers for challenging nighttime UAV tracking. Recently, the notable Segment Anything Model (SAM) has achieved a remarkable zero-shot generalization ability to discover abundant potential objects due to its huge data-driven training approach. To solve the aforementioned issue, this work proposes a novel SAM-powered DA framework for real-time nighttime UAV tracking, i.e., SAM-DA. Specifically, an innovative SAM-powered target domain training sample swelling is designed to determine enormous high-quality target domain training samples from every single raw nighttime image. This novel one-to-many generation significantly expands the high-quality target domain training sample for DA. Comprehensive experiments on extensive nighttime UAV videos prove the robustness and domain adaptability of SAM-DA for nighttime UAV tracking. Especially, compared to the SOTA DA, SAM-DA can achieve better performance with fewer raw nighttime images, i.e., the fewer-better training. This economized training approach facilitates the quick validation and deployment of algorithms for UAVs. The code is available at https://github.com/vision4robotics/SAM-DA.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. S. Xuan, S. Li, M. Han, X. Wan, and G.-S. Xia, “Object Tracking in Satellite Videos by Improved Correlation Filters With Motion Estimations,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 2, pp. 1074–1086, 2020.
  2. J. Shao, B. Du, C. Wu, and L. Zhang, “Tracking Objects From Satellite Videos: A Velocity Feature Based Correlation Filter,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 10, pp. 7860–7871, 2019.
  3. L. A. Varga, B. Kiefer, M. Messmer, and A. Zell, “SeaDronesSee: A Maritime Benchmark for Detecting Humans in Open Water,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 2260–2270.
  4. Y. Li, C. Fu, F. Ding, Z. Huang, and G. Lu, “AutoTrack: Towards High-Performance Visual Tracking for UAV With Automatic Spatio-Temporal Regularization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 11 920–11 929.
  5. H. Fan, H. Bai, L. Lin, F. Yang, P. Chu, G. Deng, S. Yu, M. Huang, J. Liu, Y. Xu et al., “LaSOT: A High-Quality Large-Scale Single Object Tracking Benchmark,” International Journal of Computer Vision, vol. 129, pp. 439–461, 2021.
  6. L. Huang, X. Zhao, and K. Huang, “GOT-10k: A Large High-Diversity Benchmark for Generic Object Tracking in the Wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 5, pp. 1562–1577, 2021.
  7. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Proceedings of the European Conference on Computer Vision (ECCV), 2014, pp. 740–755.
  8. E. Real, J. Shlens, S. Mazzocchi, X. Pan, and V. Vanhoucke, “YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 7464–7473.
  9. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein et al., “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, pp. 211–252, 2015.
  10. Z. Cao, C. Fu, J. Ye, B. Li, and Y. Li, “HiFT: Hierarchical Feature Transformer for Aerial Tracking,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 15 457–15 466.
  11. D. Guo, Y. Shao, Y. Cui, Z. Wang, L. Zhang, and C. Shen, “Graph Attention Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 9538–9547.
  12. Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr, “Fast Online Object Tracking and Segmentation: A Unifying Approach,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 1328–1338.
  13. H. Zuo, C. Fu, S. Li, J. Ye, and G. Zheng, “DeconNet: End-to-End Decontaminated Network for Vision-Based Aerial Tracking,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.
  14. J. Ye, C. Fu, Z. Cao, S. An, G. Zheng, and B. Li, “Tracker Meets Night: A Transformer Enhancer for UAV Tracking,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 3866–3873, 2022.
  15. J. Ye, C. Fu, G. Zheng, Z. Cao, and B. Li, “DarkLighter: Light Up the Darkness for UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3079–3085.
  16. J. Ye, C. Fu, G. Zheng, D. P. Paudel, and G. Chen, “Unsupervised Domain Adaptation for Nighttime Aerial Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 8886–8895.
  17. M. Zhang, J. Liu, Y. Wang, Y. Piao, S. Yao, W. Ji, J. Li, H. Lu, and Z. Luo, “Dynamic Context-Sensitive Filtering Network for Video Salient Object Detection,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2021, pp. 1553–1563.
  18. W. Wang, Q. Lai, H. Fu, J. Shen, H. Ling, and R. Yang, “Salient Object Detection in the Deep Learning Era: An In-Depth Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 6, pp. 3239–3259, 2022.
  19. A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment Anything,” arXiv preprint arXiv:2304.02643, pp. 1–30, 2023.
  20. L. Tang, H. Xiao, and B. Li, “Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection,” arXiv preprint arXiv:2304.04709, pp. 1–6, 2023.
  21. T. Yu, R. Feng, R. Feng, J. Liu, X. Jin, W. Zeng, and Z. Chen, “Inpaint Anything: Segment Anything Meets Image Inpainting,” arXiv preprint arXiv:2304.06790, pp. 1–7, 2023.
  22. S. Roy, T. Wald, G. Koehler, M. R. Rokuss, N. Disch, J. Holzschuh, D. Zimmerer, and K. H. Maier-Hein, “SAM.MD: Zero-Shot Medical Image Segmentation Capabilities of the Segment Anything Model,” arXiv preprint arXiv:2304.05396, pp. 1–4, 2023.
  23. Z. Chen, B. Zhong, G. Li, S. Zhang, and R. Ji, “Siamese Box Adaptive Network for Visual Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 6667–6676.
  24. B. Li, C. Fu, F. Ding, J. Ye, and F. Lin, “All-Day Object Tracking for Unmanned Aerial Vehicle,” IEEE Transactions on Mobile Computing, 2022.
  25. B. Li, C. Fu, F. Ding, J. Ye, and F. Lin, “ADTrack: Target-Aware Dual Filter Learning for Real-Time Anti-Dark UAV Tracking,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 496–502.
  26. C. Fu, H. Dong, J. Ye, G. Zheng, S. Li, and J. Zhao, “HighlightNet: Highlighting Low-Light Potential Features for Real-Time UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12 146–12 153.
  27. B. Li, W. Wu, Q. Wang, F. Zhang, J. Xing, and J. Yan, “SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4277–4286.
  28. Z. Cao, C. Fu, J. Ye, B. Li, and Y. Li, “SiamAPN++: Siamese Attentional Aggregation Network for Real-Time UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3086–3092.
  29. H. Zuo, C. Fu, S. Li, J. Ye, and G. Zheng, “End-to-End Feature Decontaminated Network for UAV Tracking,” in Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2022, pp. 12 130–12 137.
  30. C. Fu, M. Cai, S. Li, K. Lu, H. Zuo, and C. Liu, “Continuity-Aware Latent Interframe Information Mining for Reliable UAV Tracking,” arXiv preprint arXiv:2303.04525, 2023.
  31. L. Yao, C. Fu, S. Li, G. Zheng, and J. Ye, “SGDViT: Saliency-Guided Dynamic Vision Transformer for UAV Tracking,” arXiv preprint arXiv:2303.04378, 2023.
  32. X. Wu, Z. Wu, H. Guo, L. Ju, and S. Wang, “DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 15 764–15 773.
  33. Y. Sasagawa and H. Nagahara, “YOLO in the Dark - Domain Adaptation Method for Merging Multiple Models,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 345–359.
  34. K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked Autoencoders are Scalable Vision Learners,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 16 000–16 009.
  35. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” in Proceedings of the International Conference on Learning Representations (ICLR), 2021, pp. 1–21.
  36. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is All You Need,” in Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2017, pp. 1–11.
  37. Y. Ganin and V. Lempitsky, “Unsupervised Domain Adaptation by Backpropagation,” in Proceedings of the International Conference on Machine Learning (ICML), 2015, pp. 1180–1189.
  38. X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least Squares Generative Adversarial Networks,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2794–2802.
  39. C. Fu, K. Lu, G. Zheng, J. Ye, Z. Cao, B. Li, and G. Lu, “Siamese Object Tracking for Unmanned Aerial Vehicle: A Review and Comprehensive Analysis,” arXiv preprint arXiv:2205.04281, pp. 1–33, 2022.
  40. C. Liu, X.-F. Chen, C.-J. Bo, and D. Wang, “Long-term Visual Tracking: Review and Experimental Comparison,” Machine Intelligence Research, pp. 1–19, 2022.
  41. C. Fu, B. Li, F. Ding, F. Lin, and G. Lu, “Correlation Filters for Unmanned Aerial Vehicle-Based Aerial Tracking: A Review and Experimental Evaluation,” IEEE Geoscience and Remote Sensing Magazine, vol. 10, no. 1, pp. 125–160, 2021.
  42. Y. Xu, Z. Wang, Z. Li, Y. Yuan, and G. Yu, “SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020, pp. 12 549–12 556.
  43. Z. Zhang and H. Peng, “Deeper and Wider Siamese Networks for Real-Time Visual Tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 4586–4595.
  44. L. Zhang, A. Gonzalez-Garcia, J. V. D. Weijer, M. Danelljan, and F. S. Khan, “Learning the Model Update for Siamese Trackers,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 4009–4018.
  45. Z. Zhang, H. Peng, J. Fu, B. Li, and W. Hu, “Ocean: Object-Aware Anchor-Free Tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), 2020, pp. 771–787.
  46. C. Fu, Z. Cao, Y. Li, J. Ye, and C. Feng, “Siamese Anchor Proposal Network for High-Speed Aerial Tracking,” in Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 510–516.
  47. C. Li, C. Guo, and C. C. Loy, “Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 8, pp. 4225–4238, 2022.
  48. L. Van der Maaten and G. Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning Research, vol. 9, no. 11, p. 2579–2605, 2008.
  49. M. Mueller, N. Smith, and B. Ghanem, “A Benchmark and Simulator for UAV Tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), 2016, pp. 445–461.
  50. C. Fu, Z. Cao, Y. Li, J. Ye, and C. Feng, “Onboard Real-Time Aerial Tracking With Efficient Siamese Anchor Proposal Network,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
Citations (12)

Summary

We haven't generated a summary for this paper yet.