Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
125 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model (2404.07072v2)

Published 10 Apr 2024 in cs.CV

Abstract: In the field of computer vision, visible light images often exhibit low contrast in low-light conditions, presenting a significant challenge. While infrared imagery provides a potential solution, its utilization entails high costs and practical limitations. Recent advancements in deep learning, particularly the deployment of Generative Adversarial Networks (GANs), have facilitated the transformation of visible light images to infrared images. However, these methods often experience unstable training phases and may produce suboptimal outputs. To address these issues, we propose a novel end-to-end Transformer-based model that efficiently converts visible light images into high-fidelity infrared images. Initially, the Texture Mapping Module and Color Perception Adapter collaborate to extract texture and color features from the visible light image. The Dynamic Fusion Aggregation Module subsequently integrates these features. Finally, the transformation into an infrared image is refined through the synergistic action of the Color Perception Adapter and the Enhanced Perception Attention mechanism. Comprehensive benchmarking experiments confirm that our model outperforms existing methods, producing infrared images of markedly superior quality, both qualitatively and quantitatively. Furthermore, the proposed model enables more effective downstream applications for infrared images than other methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. X. Jia, C. Zhu, M. Li, W. Tang, and W. Zhou, “Llvip: A visible-infrared paired dataset for low-light vision,” in ICCV, 2021, pp. 3496–3504.
  2. Q. Liu, Z. He, X. Li, and Y. Zheng, “Ptb-tir: A thermal infrared pedestrian tracking benchmark,” IEEE Transactions on Multimedia, vol. 22, no. 3, pp. 666–675, 2019.
  3. S. Hwang, J. Park, N. Kim, Y. Choi, and I.-S. Kweon, “Multispectral pedestrian detection: Benchmark dataset and baseline,” in CVPR, 2015, pp. 1037–1045.
  4. Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, “Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes,” in IROS, 2017, pp. 5108–5115.
  5. Z. Pengyu, J. Zhao, D. Wang, H. Lu, and X. Ruan, “Visible-thermal uav tracking: A large-scale benchmark and new baseline,” in CVPR, 2022.
  6. T. Wang, B. Chen, Z. Zhang, H. Li, and M. Zhang, “Applications of machine vision in agricultural robot navigation: A review,” Computers and Electronics in Agriculture, vol. 198, p. 107085, 2022.
  7. M. T. Chiu, X. Xu, Y. Wei, Z. Huang, A. G. Schwing, R. Brunner, H. Khachatrian, H. Karapetyan, I. Dozier, G. Rose, D. Wilson, A. Tudor, N. Hovakimyan, T. S. Huang, and H. Shi, “Agriculture-vision: A large aerial image database for agricultural pattern analysis,” in CVPR, June 2020.
  8. H. Chen, A. Chen, L. Xu, H. Xie, H. Qiao, Q. Lin, and K. Cai, “A deep learning cnn architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources,” Agricultural Water Management, vol. 240, p. 106303, 2020.
  9. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in NIPS, vol. 27, 2014.
  10. W. Liu, X. Shen, C.-M. Pun, and X. Cun, “Explicit visual prompting for low-level structure segmentations,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 434–19 445.
  11. W. Liu, X. Shen, H. Li, X. Bi, B. Liu, C.-M. Pun, and X. Cun, “Depth-aware test-time training for zero-shot video object segmentation,” arXiv preprint arXiv:2403.04258, 2024.
  12. H. Dong, S. Yu, C. Wu, and Y. Guo, “Semantic image synthesis via adversarial learning,” in ICCV, 2017, pp. 5706–5714.
  13. C. Gong, C. Jing, X. Chen, C. M. Pun, G. Huang, A. Saha, M. Nieuwoudt, H.-X. Li, Y. Hu, and S. Wang, “Generative ai for brain image computing and brain network computing: a review,” Frontiers in Neuroscience, vol. 17, p. 1203104, 2023.
  14. A. Cherian and A. Sullivan, “Sem-gan: Semantically-consistent image-to-image translation,” in WACV, 2019, pp. 1797–1806.
  15. H. Li and C.-M. Pun, “Cee-net: complementary end-to-end network for 3d human pose generation and estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 1, 2023, pp. 1305–1313.
  16. ——, “Monocular robust 3d human localization by global and body-parts depth awareness,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 11, pp. 7692–7705, 2022.
  17. Z. Zhou, Y. Huo, G. Huang, A. Zeng, X. Chen, L. Huang, and Z. Li, “Qean: Quaternion-enhanced attention network for visual dance generation,” arXiv preprint arXiv:2403.11626, 2024.
  18. Q. Zuo, R. Li, Y. Di, H. Tian, C. Jing, X. Chen, and S. Wang, “Diffgan-f2s: Symmetric and efficient denoising diffusion gans for structural connectivity prediction from brain fmri,” arXiv preprint arXiv:2309.16205, 2023.
  19. X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao, and C. Change Loy, “Esrgan: Enhanced super-resolution generative adversarial networks,” in ECCV Workshops, 2018, pp. 0–0.
  20. K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, and J. Jiang, “Edge-enhanced gan for remote sensing image superresolution,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 8, pp. 5799–5812, 2019.
  21. A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training for high fidelity natural image synthesis,” in ICLR, 2018.
  22. F. Zhan, H. Zhu, and S. Lu, “Spatial fusion gan for image synthesis,” in CVPR, 2019, pp. 3653–3662.
  23. W. Liu, X. Cun, C.-M. Pun, M. Xia, Y. Zhang, and J. Wang, “Coordfill: Efficient high-resolution image inpainting via parameterized coordinate querying,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 1746–1754.
  24. S. Azadi, M. Fisher, V. G. Kim, Z. Wang, E. Shechtman, and T. Darrell, “Multi-content gan for few-shot font style transfer,” in CVPR, 2018, pp. 7564–7573.
  25. G. Kwon and J. C. Ye, “Clipstyler: Image style transfer with a single text condition,” in CVPR, 2022, pp. 18 062–18 071.
  26. G. Huang, X. Chen, Y. Shen, and S. Wang, “Mr image super-resolution using wavelet diffusion for predicting alzheimer’s disease,” in International Conference on Brain Informatics (BI), 2023, pp. 146–157.
  27. T. Zhou, X. Chen, Y. Shen, M. Nieuwoudt, C.-M. Pun, and S. Wang, “Generative ai enables eeg data augmentation for alzheimer’s disease detection via diffusion model,” in 2023 IEEE ISPCE-ASIA, 2023, pp. 1–6.
  28. X. Chen, B. Lei, C.-M. Pun, and S. Wang, “Brain diffuser: An end-to-end brain image to brain network pipeline,” in Pattern Recognition and Computer Vision (PRCV), 2023, pp. 16–26.
  29. P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in CVPR, 2017, pp. 1125–1134.
  30. J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in ICCV, 2017, pp. 2223–2232.
  31. M.-Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-to-image translation networks,” NIPS, vol. 30, 2017.
  32. X. Huang, M.-Y. Liu, S. Belongie, and J. Kautz, “Multimodal unsupervised image-to-image translation,” in ECCV, 2018, pp. 172–189.
  33. V. V. Kniaz, V. A. Knyaz, J. Hladuvka, W. G. Kropatsch, and V. Mizginov, “Thermalgan: Multimodal color-to-thermal image translation for person re-identification in multispectral dataset,” in ECCV Workshops, 2018, pp. 0–0.
  34. M. A. Özkanoğlu and S. Ozer, “Infragan: A gan architecture to transfer visible images to infrared domain,” Pattern Recognition Letters, vol. 155, pp. 69–76, 2022.
  35. S. Liu, C. Zhu, F. Xu, X. Jia, Z. Shi, and M. Jin, “Bci: Breast cancer immunohistochemical image generation through pyramid pix2pix,” in CVPR, 2022, pp. 1815–1824.
  36. Y. Luo, D. Pi, Y. Pan, L. Xie, W. Yu, and Y. Liu, “Clawgan: Claw connection-based generative adversarial networks for facial image translation in thermal to rgb visible light,” Expert Systems with Applications, vol. 191, p. 116269, 2022.
  37. Z. Li, X. Chen, C.-M. Pun, and X. Cun, “High-resolution document shadow removal via a large-scale real-world dataset and a frequency-aware shadow erasing net,” in International Conference on Computer Vision (ICCV), October 2023, pp. 12 449–12 458.
  38. Z. Li, X. Chen, S. Wang, and C.-M. Pun, “A large-scale film style dataset for learning multi-frequency driven film enhancement,” in IJCAI, 2023, pp. 1160–1168.
  39. S. Luo, X. Chen, W. Chen, Z. Li, S. Wang, and C.-M. Pun, “Devignet: High-resolution vignetting removal via a dual aggregated fusion transformer with adaptive channel expansion,” in AAAI, 2023.
  40. L. Chen, X. Chu, X. Zhang, and J. Sun, “Simple baselines for image restoration,” in ECCV, 2022, pp. 17–33.
  41. S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, and M.-H. Yang, “Restormer: Efficient transformer for high-resolution image restoration,” in CVPR, 2022, pp. 5728–5739.
  42. X. Chen, X. Cun, C.-M. Pun, and S. Wang, “Shadocnet: Learning spatial-aware tokens in transformer for document shadow removal,” in ICASSP, 2023, pp. 1–5.
  43. Z. Li, X. Chen, C.-M. Pun, and S. Wang, “Wavenhancer: Unifying wavelet and transformer for image enhancement,” arXiv preprint arXiv:2212.08327, 2022.
  44. Q. Zuo, J. Hu, Y. Zhang, J. dong Pan, C. Jing, X. Chen, X. Meng, and J. Hong, “Brain functional network generation using distribution-regularized adversarial graph autoencoder with transformer for dementia diagnosis,” Computer Modeling in Engineering & Sciences, vol. 137, pp. 2129–2147, 2023.
  45. X. Chen, C.-M. Pun, and S. Wang, “Medprompt: Cross-modal prompting for multi-task medical image translation,” arXiv preprint arXiv:2310.02663, 2023.
  46. H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “Fusiondn: A unified densely connected network for image fusion,” in AAAI, vol. 34, no. 07, 2020, pp. 12 484–12 491.
  47. J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” in CVPR, 2022, pp. 5802–5811.
  48. Y. Liang, R. Wakaki, S. Nobuhara, and K. Nishino, “Multimodal material segmentation,” in CVPR, June 2022, pp. 19 800–19 808.
  49. I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in ICLR, 2017.
  50. ——, “Sgdr: Stochastic gradient descent with warm restarts,” in ICLR, 2016.
  51. G. Li, C.-M. Pun, H. Li, J. Xiong, F. Xu, and H. Gao, “An optimized-skeleton-based parkinsonian gait auxiliary diagnosis method with both monitoring indicators and assisted ratings,” in 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).   IEEE, 2023, pp. 2011–2016.
  52. Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion,” in CVPR, 2023, pp. 5906–5916.
  53. Ultralytics, “YOLOv5: A state-of-the-art real-time object detection system,” 2021.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com