Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
GPT-4o
Gemini 2.5 Pro Pro
o3 Pro
GPT-4.1 Pro
DeepSeek R1 via Azure Pro
2000 character limit reached

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection (2405.16038v2)

Published 25 May 2024 in cs.CV

Abstract: Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, we address this issue by improving the performance of efficient single-branch structures. We revisit the reasons causing the performance gap between these structures. For the first time, we reveal the information interference problem in the naive early-fusion strategy adopted by previous single-branch structures. Besides, we find that the domain gap between multispectral images, and weak feature representation of the single-branch structure are also key obstacles for performance. Focusing on these three problems, we propose corresponding solutions, including a novel shape-priority early-fusion strategy, a weakly supervised learning method, and a core knowledge distillation technique. Experiments demonstrate that single-branch networks equipped with these three contributions achieve significant performance enhancements while retaining high efficiency. Our code will be available at \url{https://github.com/XueZ-phd/Efficient-RGB-T-Early-Fusion-Detection}.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-Aware Dual Adversarial Learning and a Multi-Scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object Detection,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2022.
  2. G. Jocher, “YOLOv5 by Ultralytics,” 2020. [Online]. Available: https://github.com/ultralytics/yolov5
  3. Z. Chen and X. Huang, “Pedestrian Detection for Autonomous Vehicle Using Multi-Spectral Cameras,” IEEE Transactions on Intelligent Vehicles, vol. 4, no. 2, pp. 211–219, 2019.
  4. W. Zhou, S. Dong, M. Fang, and L. Yu, “CACFNet: Cross-Modal Attention Cascaded Fusion Network for RGB-T Urban Scene Parsing,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 1919–1929, 2024.
  5. Y. Liu, C. Hu, B. Zhao, Y. Huang, and X. Zhang, “Region-Based Illumination-Temperature Awareness and Cross-Modality Enhancement for Multispectral Pedestrian Detection,” IEEE Transactions on Intelligent Vehicles, pp. 1–12, 2024.
  6. M. A. Farooq, W. Shariff, and P. Corcoran, “Evaluation of Thermal Imaging on Embedded GPU Platforms for Application in Vehicular Assistance Systems,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 2, pp. 1130–1144, 2022.
  7. M. Ding, W.-H. Chen, and Y.-F. Cao, “Thermal Infrared Single-Pedestrian Tracking for Advanced Driver Assistance System,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 814–824, 2023.
  8. W. Zhou, S. Dong, J. Lei, and L. Yu, “MTANet: Multitask-Aware Network With Hierarchical Multimodal Fusion for RGB-T Urban Scene Understanding,” IEEE Transactions on Intelligent Vehicles, vol. 8, no. 1, pp. 48–58, 2023.
  9. Y. Guo, H. Kong, and S. Gu, “Unsupervised Multi-Spectrum Stereo Depth Estimation for All-Day Vision,” IEEE Transactions on Intelligent Vehicles, vol. 9, no. 1, pp. 501–511, 2024.
  10. Y. Zhu, C. Li, J. Tang, and B. Luo, “Quality-Aware Feature Aggregation Network for Robust RGB-T Tracking,” IEEE Transactions on Intelligent Vehicles, vol. 6, no. 1, pp. 121–130, 2021.
  11. J. Liu, S. Zhang, S. Wang, and D. N. Metaxas, “Multispectral Deep Neural Networks for Pedestrian Detection,” in Proceedings of the British Machine Vision Conference, 2016.
  12. J. Wagner, V. Fischer, M. Herman, S. Behnke et al., “Multispectral Pedestrian Detection using Deep Fusion Convolutional Neural Networks,” in Proceedings of the European Symposium on Artificial Neural Networks, vol. 587, 2016, pp. 509–514.
  13. Q. Xie, T.-Y. Cheng, Z. Dai, V. Tran, N. Trigoni, and A. Markham, “Illumination-Aware Hallucination-Based Domain Adaptation for Thermal Pedestrian Detection,” IEEE Transactions on Intelligent Transportation Systems, 2023.
  14. T. Liu, K.-M. Lam, R. Zhao, and G. Qiu, “Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 315–329, 2021.
  15. H. Zhang, E. Fromont, S. Lefèvre, and B. Avignon, “Guided Attentive Feature Fusion for Multispectral Pedestrian Detection,” in Proceedings of the Winter Conference on Applications of Computer Vision, 2021, pp. 72–80.
  16. B. Yin, X. Zhang, Z. Li, L. Liu, M.-M. Cheng, and Q. Hou, “DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation,” in Proceedings of the International Conference on Learning Representations, 2024.
  17. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning Transferable Visual Models from Natural Language Supervision,” in Proceedings of the International Conference on Machine Learning, vol. 139, 2021, pp. 8748–8763.
  18. Z. Lai, N. Vesdapunt, N. Zhou, J. Wu, C. P. Huynh, X. Li, K. K. Fu, and C.-N. Chuah, “PADCLIP: Pseudo-Labeling with Adaptive Debiasing in CLIP for Unsupervised Domain Adaptation,” in Proceedings of the International Conference on Computer Vision, 2023, pp. 16 109–16 119.
  19. G. Hinton, O. Vinyals, and J. Dean, “Distilling the Knowledge in a Neural Network,” in Proceedings of the Advances in Neural Information Processing Systems Workshop, 2015.
  20. A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, “FitNets: Hints for Thin Deep Nets,” in Proceedings of the International Conference on Learning Representations, 2015.
  21. G. Chen, W. Choi, X. Yu, T. Han, and M. Chandraker, “Learning Efficient Object Detection Models with Knowledge Distillation,” Advances in Neural Information Processing Systems, vol. 30, 2017.
  22. Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, “Multimodal Object Detection via Probabilistic Ensembling,” in Proceedings of the European Conference on Computer Vision, 2022, pp. 139–158.
  23. H. Zhang, E. Fromont, S. Lefèvre, and B. Avignon, “Low-Cost Multispectral Scene Analysis with Modality Distillation,” in Proceedings of the Winter Conference on Applications of Computer Vision, 2022, pp. 803–812.
  24. Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2023, pp. 5906–5916.
  25. L. Tang, X. Xiang, H. Zhang, M. Gong, and J. Ma, “DIVFusion: Darkness-Free Infrared and Visible Image Fusion,” Information Fusion, vol. 91, pp. 477–493, 2023.
  26. D. Zhang, J. Han, G. Cheng, and M.-H. Yang, “Weakly Supervised Object Localization and Detection: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5866–5885, 2021.
  27. Y. Zhang, H. Yu, Y. He, X. Wang, and W. Yang, “Illumination-Guided RGBT Object Detection with Inter- and Intra-Modality Fusion,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–13, 2023.
  28. X. Zhang, X. Zhang, Z. Sheng, and H.-L. Shen, “TFDet: Target-Aware Fusion for RGB-T Pedestrian Detection,” arXiv preprint arXiv:2305.16580, 2023.
  29. Z. Li, P. Xu, X. Chang, L. Yang, Y. Zhang, L. Yao, and X. Chen, “When Object Detection Meets Knowledge Distillation: A Survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
  30. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” in Proceedings of the International Conference on Computer Vision, 2017, pp. 2980–2988.
  31. X. Li, W. Wang, L. Wu, S. Chen, X. Hu, J. Li, J. Tang, and J. Yang, “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,” Advances in Neural Information Processing Systems, vol. 33, pp. 21 002–21 012, 2020.
  32. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image Quality Assessment: From Error Visibility to Structural Similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004.
  33. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
  34. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common Objects in Context,” in Proceedings of the European Conference on Computer Vision, 2014, pp. 740–755.
  35. “FREE FLIR Thermal Dataset for Algorithm Training,” https://www.flir.com/oem/adas/adas-dataset-form/.
  36. R. Abdelfattah, Q. Guo, X. Li, X. Wang, and S. Wang, “CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification,” in Proceedings of the International Conference on Computer Vision, 2023, pp. 1348–1357.
  37. Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu, “Deep Mutual Learning,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2018, pp. 4320–4328.
  38. L. Yuan, F. E. Tay, G. Li, T. Wang, and J. Feng, “Revisiting Knowledge Distillation via Label Smoothing Regularization,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2020, pp. 3903–3911.
  39. K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
  40. Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and Model-Based Infrared and Visible Image Fusion via Algorithm Unrolling,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1186–1196, 2022.
  41. J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion,” IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020.
  42. H. Li and X.-J. Wu, “DenseFuse: A Fusion Approach to Infrared and Visible Images,” IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2019.
  43. L. Tang, H. Zhang, H. Xu, and J. Ma, “Rethinking the Necessity of Image Fusion in High-Level Vision Tasks: A Practical Infrared and Visible Image Fusion Network Based on Progressive Semantic Injection and Scene Fidelity,” Information Fusion, vol. 99, p. 101870, 2023.
  44. H. Li, X.-J. Wu, and J. Kittler, “RFN-Nest: An End-to-End Residual Fusion Network for Infrared and Visible Images,” Information Fusion, vol. 73, pp. 72–86, 2021.
  45. L. Tang, J. Yuan, and J. Ma, “Image Fusion in the Loop of High-Level Vision Tasks: A Semantic-Aware Real-Time Infrared and Visible Image Fusion Network,” Information Fusion, vol. 82, pp. 28–42, 2022.
  46. H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A Unified Unsupervised Image Fusion Network,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  47. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Proceedings of the European Conference on Computer Vision, 2018.
  48. C. Devaguptapu, N. Akolekar, M. M Sharma, and V. N Balasubramanian, “Borrow from Anywhere: Pseudo Multi-Modal Object Detection in Thermal Imagery,” in Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0–0.
  49. F. Munir, S. Azam, M. A. Rafique, A. M. Sheri, and M. Jeon, “Thermal Object Detection using Domain Adaptation through Style Consistency,” ArXiv, vol. abs/2006.00821, 2020. [Online]. Available: https://api.semanticscholar.org/CorpusID:219176719
  50. H. Zhang, E. Fromont, S. Lefevre, and B. Avignon, “Multispectral Fusion for Object Detection with Cyclic Fuse-and-Refine Blocks,” in Proceedings of the International Conference on Image Processing, 2020, pp. 276–280.
  51. M. Kieu, A. D. Bagdanov, and M. Bertini, “Bottom-Up and Layerwise Domain Adaptation for Pedestrian Detection in Thermal Images,” ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 17, no. 1, 2021.
  52. Q. Li, C. Zhang, Q. Hu, P. Zhu, H. Fu, and L. Chen, “Stabilizing Multispectral Pedestrian Detection with Evidential Hybrid Fusion,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 4, pp. 3017–3029, 2024.
  53. Y. Cao, T. Zhou, X. Zhu, and Y. Su, “Every Feature Counts: An Improved One-Stage Detector in Thermal Imagery,” in Proceedings of the International Conference on Computer and Communications, 2019, pp. 1965–1969.
  54. S. You, X. Xie, Y. Feng, C. Mei, and Y. Ji, “Multi-Scale Aggregation Transformers for Multispectral Object Detection,” IEEE Signal Processing Letters, vol. 30, pp. 1172–1176, 2023.
  55. Y. Cao, J. Bin, J. Hamari, E. Blasch, and Z. Liu, “Multimodal Object Detection by Channel Switching and Spatial Attention,” in Proceedings of the Conference on Computer Vision and Pattern Recognition Workshops, 2023, pp. 403–411.
  56. Y. Zhu, X. Sun, M. Wang, and H. Huang, “Multi-Modal Feature Pyramid Transformer for RGB-Infrared Object Detection,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 9, pp. 9984–9995, 2023.
  57. V. Sam, K. Ali, M. Christian, K. Laurent, and E. Lutz, “Robust Environment Perception for Automated Driving: A Unified Learning Pipeline for Visual-Infrared Object Detection,” in IEEE Intelligent Vehicles Symposium, 2022, pp. 367–374.
  58. MMDetection Contributors, “OpenMMLab Detection Toolbox and Benchmark,” 2018. [Online]. Available: https://github.com/open-mmlab/mmdetection
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com