Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions (2404.16302v1)

Published 25 Apr 2024 in cs.CV, cs.MM, cs.RO, and eess.IV

Abstract: Cross-modality images that integrate visible-infrared spectra cues can provide richer complementary information for object detection. Despite this, existing visible-infrared object detection methods severely degrade in severe weather conditions. This failure stems from the pronounced sensitivity of visible images to environmental perturbations, such as rain, haze, and snow, which frequently cause false negatives and false positives in detection. To address this issue, we introduce a novel and challenging task, termed visible-infrared object detection under adverse weather conditions. To foster this task, we have constructed a new Severe Weather Visible-Infrared Dataset (SWVID) with diverse severe weather scenes. Furthermore, we introduce the Cross-modality Fusion Mamba with Weather-removal (CFMW) to augment detection accuracy in adverse weather conditions. Thanks to the proposed Weather Removal Diffusion Model (WRDM) and Cross-modality Fusion Mamba (CFM) modules, CFMW is able to mine more essential information of pedestrian features in cross-modality fusion, thus could transfer to other rarer scenarios with high efficiency and has adequate availability on those platforms with low computing power. To the best of our knowledge, this is the first study that targeted improvement and integrated both Diffusion and Mamba modules in cross-modality object detection, successfully expanding the practical application of this type of model with its higher accuracy and more advanced architecture. Extensive experiments on both well-recognized and self-created datasets conclusively demonstrate that our CFMW achieves state-of-the-art detection performance, surpassing existing benchmarks. The dataset and source code will be made publicly available at https://github.com/lhy-zjut/CFMW.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (61)
  1. Crossvit: Cross-attention Multi-scale Vision Transformer for Image Classification. In Proceedings of the the IEEE/CVF International Conference on Computer Vision (ICCV).
  2. JSTASR: Joint Size and Transparency-Aware Snow Removal Algorithm Based on Modified Partial Convolution and Veiling Effect Removal. In Proceedings of the the European Conference on Computer Vision (ECCV).
  3. Learning Multiple Adverse Weather Removal via Two-stage Knowledge Learning and Multi-contrastive Regularization: Toward A Unified Model. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  4. Multimodal Object Detection via Bayesian Fusion. arXiv preprint arXiv:2104.02904 3, 6 (2021).
  5. MUFusion: A General Unsupervised Image Fusion Network Based on Memory Unit. Information Fusion 92 (2023), 80–92.
  6. ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
  7. Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion Models Beat Gans on Image Synthesis. Advances in neural information processing systems 34 (2021), 8780–8794.
  8. Team. F. 2018. Free Flir Thermal Dataset for Algorithm Training. (2018). https://www.flir.com/oem/adas/adas-dataset-form/.
  9. Deep-masking Generative Network: A Unified Framework for Background Restoration from Superimposed Images. IEEE Transactions on Image Processing 30 (2021), 4867–4882.
  10. Albert Gu and Tri Dao. 2023. Mamba: Linear-time Sequence Modeling with Selective State Spaces. arXiv preprint arXiv:2312.00752 (2023).
  11. Efficiently Modeling Long Sequences with Structured State Spaces. arXiv preprint arXiv:2111.00396 (2021).
  12. Pan-Mamba: Effective pan-sharpening with State Space Model. arXiv preprint arXiv 2402.12192 (2024).
  13. Denoising Diffusion Probabilistic Models. Advances in neural information processing systems 33 (2020), 6840–6851.
  14. Cascaded Diffusion Models for High Fidelity Image Generation. Journal of Machine Learning Research 23, 47 (2022), 1–33.
  15. Quan Huynh-Thu and Mohammed Ghanbari. 2008. Scope of Validity of PSNR in Image/Video Quality Assessment. Electronics Letters 44 (2008), 800–801.
  16. Multispectral Pedestrian Detection: Benchmark Dataset and Baselines. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  17. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  18. LLVIP: A Visible-infrared Paired Dataset for Low-light Vision. In Proceedings of the the IEEE/CVF International Conference on Computer Vision (ICCV).
  19. Rain-Free and Residue Hand-in-Hand: A Progressive Coupled Network for Real-Time Image Deraining. IEEE Transactions on Image Processing 30 (2021), 7404–7418.
  20. Glenn Jocher. 2020. YOLOv5 by Ultralytics. https://doi.org/10.5281/zenodo.3908559
  21. Denoising Diffusion Restoration Models. In Proceedings of the the International Conference on Advances in Neural Information Processing Systems (NIPS).
  22. All-in-one Image Restoration for Unknown Corruption. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  23. All in One Bad Weather Removal Using Architectural Search. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  24. Recurrent Squeeze-and-excitation Context Aggregation Net for Single Image Deraining. In Proceedings of the European conference on computer vision (ECCV). 254–269.
  25. Fusion from Decomposition: A Self-Supervised Decomposition Approach for Image Fusion. In Proceedings of the the European Conference on Computer Vision (ECCV).
  26. Microsoft COCO: Common Objects in Context. In Proceedings of the the European Conference on Computer Vision (ECCV).
  27. Target-aware Dual Adversarial Learning and A Multi-scenario Multi-modality Benchmark to Fuse Infrared and Visible for Object Detection. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  28. SSD: Single Shot MultiBox Detector. In Proceedings of the the European Conference on Computer Vision (ECCV).
  29. Dual Residual Networks Leveraging the Potential of Paired Operations for Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  30. Vmamba: Visual State Space Model. arXiv preprint arXiv:2401.10166 (2024).
  31. Long Range Language Modeling via Gated State Spaces. arXiv preprint arXiv:2206.13947 (2022).
  32. Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline. In Proceedings of the the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  33. Attentive Generative Adversarial Network for Raindrop Removal from A Single Image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  34. Fang Qingyun and Wang Zhaokui. 2022. Cross-modality Attentive Feature Fusion for Object Detection in Multispectral Remote Sensing Imagery. Pattern Recognition 130 (2022), 108786.
  35. Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv preprint arXiv 1804.02767 (2018).
  36. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2015), 1137–1149.
  37. YOLOrs: Object Detection in Multimodal Remote Sensing Imagery. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14 (2020), 1497–1508.
  38. Simplified State Space Layers for Sequence Modeling. arXiv preprint arXiv:2208.04933 (2022).
  39. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In Proceedings of the the International Conference on Machine Learning (ICML).
  40. Denoising Diffusion Implicit Models. arXiv preprint arXiv:2010.02502 (2020).
  41. PIAFusion: A Progressive Infrared and Visible Image Fusion Network Based on Illumination Aware. Information Fusion 83 (2022), 79–92.
  42. Transweather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  43. Attention Is All You Need. Advances in neural information processing systems 30 (2017).
  44. Towards Online Domain Adaptive Object Detection. In Proceedings of the the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  45. YOLOv7:Trainable Bag-of-freebies Sets New State-of-the-art for Real-time Object Detectors. In Proceedings of the IEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  46. Spatial Attentive Single-Image Deraining With a High Quality Real Rain Dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  47. Image Quality Assessment: From Error Visibility to Structural Similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
  48. Deblurring via Stochastic Refinement. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  49. Image De-Raining Transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence 45 (2022), 12978–12995.
  50. FusionDN: A Unified Densely Connected Network for Image Fusion. In Association for the Advancement of Artificial Intelligence (AAAI).
  51. Infrared and Visible Image Fusion via Parallel Scene and Texture Learning. Pattern Recognition 132 (2022), 108929.
  52. Sergey Zagoruyko and Nikos Komodakis. 2016. Wide Residual Networks. arXiv preprint arXiv 1605.07146 (2016).
  53. Multi-Stage Progressive Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  54. Multispectral Fusion for Object Detection with Cyclic Fuse-and-refine Blocks. In Proceedings of the IEEE International conference on image processing (ICIP).
  55. Guided Attentive Feature Fusion for Multispectral Pedestrian Detection. In Proceedings of the the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
  56. Guided Attentive Feature Fusion for Multispectral Pedestrian Detection. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV).
  57. Deep Dense Multi-Scale Network for Snow Removal Using Semantic and Depth Priors. IEEE Transactions on Image Processing 30 (2021), 7419–7431.
  58. Interactive Feature Embedding for Infrared and Visible Image Fusion. IEEE Transactions on Neural Networks and Learning Systems (2023).
  59. Cddfuse: Correlation-driven Dual-branch Feature Decomposition for Multi-modality Image Fusion. In Proceedings of the the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
  60. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
  61. Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model. arXiv preprint arXiv:2401.09417 (2024).
Citations (5)

Summary

We haven't generated a summary for this paper yet.