Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Modality-missing RGBT Tracking: Invertible Prompt Learning and High-quality Benchmarks (2312.16244v3)

Published 25 Dec 2023 in cs.CV

Abstract: Current RGBT tracking research relies on the complete multi-modal input, but modal information might miss due to some factors such as thermal sensor self-calibration and data transmission error, called modality-missing challenge in this work. To address this challenge, we propose a novel invertible prompt learning approach, which integrates the content-preserving prompts into a well-trained tracking model to adapt to various modality-missing scenarios, for robust RGBT tracking. Given one modality-missing scenario, we propose to utilize the available modality to generate the prompt of the missing modality to adapt to RGBT tracking model. However, the cross-modality gap between available and missing modalities usually causes semantic distortion and information loss in prompt generation. To handle this issue, we design the invertible prompter by incorporating the full reconstruction of the input available modality from the generated prompt. To provide a comprehensive evaluation platform, we construct several high-quality benchmark datasets, in which various modality-missing scenarios are considered to simulate real-world challenges. Extensive experiments on three modality-missing benchmark datasets show that our method achieves significant performance improvements compared with state-of-the-art methods. We have released the code and simulation datasets at: \href{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}{https://github.com/Alexadlu/Modality-missing-RGBT-Tracking.git}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (45)
  1. Analyzing inverse problems with invertible neural networks. arXiv preprint arXiv:1808.04730, 2018.
  2. Partial transfer learning with selective adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 2724–2732, 2018.
  3. Visual micro-pattern propagation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):1267–1286, 2022.
  4. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  5. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
  6. Semantic compositional networks for visual captioning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5630–5639, 2017.
  7. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  8. Knowledge distillation from multi-modal to mono-modal segmentation networks. In Medical Image Computing and Computer Assisted Intervention–MICCAI, pages 772–781, 2020.
  9. Bridging search region interaction with template for rgb-t tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13630–13639, 2023.
  10. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1125–1134, 2017.
  11. Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 2018.
  12. Multimodal prompting with missing modalities for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14943–14952, 2023.
  13. Learning collaborative sparse representation for grayscale-thermal tracking. IEEE Trans. Image Process., 25(12):5743–5756, 2016.
  14. Weighted sparse representation regularized graph learning for rgb-t object tracking. In Proceedings of ACM International Conference on Multimedia, 2017.
  15. Rgb-t object tracking: benchmark and baseline. Pattern Recognition, 96:106977, 2019a.
  16. Multi-adapter rgbt tracking. In Proceedings of IEEE International Conference on Computer Vision Workshops, 2019b.
  17. Challenge-aware rgbt tracking. In Proceedings of the IEEE European Conference on Computer Vision, 2020.
  18. Lasher: A large-scale high-diversity benchmark for rgbt tracking. IEEE Transactions on Image Processing, 31:392–404, 2021.
  19. Cross-modal object tracking: Modality-aware representations and a unified benchmark. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1289–1296, 2022.
  20. Completer: Incomplete multi-view clustering via contrastive prediction. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11174–11183, 2021.
  21. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  22. Rgbt tracking via multi-adapter network with hierarchical divergence loss. IEEE Transactions on Image Processing, 30:5613–5625, 2021.
  23. Duality-gated mutual condition network for rgbt tracking. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  24. Smil: Multimodal learning with severely missing modality. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2302–2310, 2021.
  25. Are multimodal transformers robust to missing modality? In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18177–18186, 2022.
  26. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  27. Visible-thermal uav tracking: A large-scale benchmark and new baseline. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2022.
  28. Rényi divergence and kullback-leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, 2014.
  29. Cross-modal pattern-propagation for rgb-t tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020.
  30. Multi-modal learning with missing modality via shared-specific feature modelling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15878–15887, 2023.
  31. Acn: Adversarial co-training network for brain tumor segmentation with missing modalities. In Medical Image Computing and Computer Assisted Intervention–MICCAI, pages 410–420, 2021.
  32. Attribute-based progressive fusion network for rgbt tracking. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2831–2838, 2022.
  33. Deep incomplete multi-view clustering via mining cluster complementarity. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 8761–8769, 2022.
  34. Prompting for multi-modal tracking. In Proceedings of the 30th ACM International Conference on Multimedia, pages 3492–3500, 2022.
  35. Joint feature learning and relation modeling for tracking: A one-stream framework. In European Conference on Computer Vision, pages 341–357. Springer, 2022.
  36. 1% vs 100%: Parameter-efficient low rank adapter for dense predictions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20116–20126, 2023.
  37. Unified subspace learning for incomplete and unlabeled multi-view data. Pattern Recognition, 67:313–327, 2017.
  38. Deep partial multi-view learning. IEEE transactions on pattern analysis and machine intelligence, 44(5):2402–2415, 2020.
  39. Multi-modal fusion for end-to-end rgb-t tracking. In Proceedings of the IEEE International Conference on Computer Vision Workshops, 2019.
  40. Learning adaptive attribute-driven representation for real-time rgb-t tracking. International Journal of Computer Vision, 129:2714–2729, 2021a.
  41. Siamcda: Complementarity-and distractor-aware rgb-t tracking based on siamese network. IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1403–1417, 2021b.
  42. Efficient rgb-t tracking via cross-modality distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5404–5413, 2023.
  43. Missing modality imagination network for emotion recognition with uncertain missing modalities. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 2608–2618, 2021.
  44. Homeomorphism alignment for unsupervised domain adaptation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18699–18710, 2023.
  45. Visual prompt multi-modal tracking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9516–9526, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Andong Lu (15 papers)
  2. Chenglong Li (94 papers)
  3. Jin Tang (139 papers)
  4. Bin Luo (209 papers)
  5. jiacong Zhao (3 papers)
Citations (1)