Papers
Topics
Authors
Recent
2000 character limit reached

MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training (2404.11016v2)

Published 17 Apr 2024 in cs.CV and cs.AI

Abstract: In this paper, we introduce MaeFuse, a novel autoencoder model designed for Infrared and Visible Image Fusion (IVIF). The existing approaches for image fusion often rely on training combined with downstream tasks to obtain highlevel visual information, which is effective in emphasizing target objects and delivering impressive results in visual quality and task-specific applications. Instead of being driven by downstream tasks, our model called MaeFuse utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks, to obtain perception friendly features with a low cost. In order to eliminate the domain gap of different modal features and the block effect caused by the MAE encoder, we further develop a guided training strategy. This strategy is meticulously crafted to ensure that the fusion layer seamlessly adjusts to the feature space of the encoder, gradually enhancing the fusion performance. The proposed method can facilitate the comprehensive integration of feature vectors from both infrared and visible modalities, thus preserving the rich details inherent in each modal. MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning, pages 1597–1607. PMLR, 2020.
  2. Mufusion: A general unsupervised image fusion network based on memory unit. Information Fusion, 92:80–92, 2023.
  3. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.
  4. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16000–16009, 2022.
  5. Reconet: Recurrent correction network for fast and efficient multi-modality image fusion. In European Conference on Computer Vision, pages 539–555. Springer, 2022.
  6. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3496–3504, 2021.
  7. Unsupervised deep image fusion with structure tensor representations. IEEE Transactions on Image Processing, 29:3845–3858, 2020.
  8. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
  9. Crossfuse: A novel cross attention mechanism based infrared and visible image fusion approach. Information Fusion, 103:102147, 2024.
  10. Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks. IEEE Transactions on Multimedia, 23:1383–1396, 2020.
  11. Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73:72–86, 2021.
  12. Exploring plain vision transformer backbones for object detection. In European Conference on Computer Vision, pages 280–296. Springer, 2022.
  13. Fusion from decomposition: A self-supervised decomposition approach for image fusion. In European Conference on Computer Vision, pages 719–735. Springer, 2022.
  14. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022.
  15. Integrally migrating pre-trained transformer encoder-decoders for visual object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6825–6834, 2023.
  16. Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8115–8124, 2023.
  17. Bi-level dynamic learning for jointly multi-modality image fusion and beyond. In Edith Elkind, editor, Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, pages 1240–1248. International Joint Conferences on Artificial Intelligence Organization, 2023.
  18. Fusiongan: A generative adversarial network for infrared and visible image fusion. Information Fusion, 48:11–26, 2019.
  19. Ddcgan: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion. IEEE Transactions on Image Processing, 29:4980–4995, 2020.
  20. Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7):1200–1217, 2022.
  21. Very deep convolutional networks for large-scale image recognition. In International Conference on Learning Representations, 2015.
  22. Rtfnet: Rgb-thermal fusion network for semantic segmentation of urban scenes. IEEE Robotics and Automation Letters, 4(3):2576–2583, 2019.
  23. Superfusion: A versatile image registration and fusion network with semantic awareness. IEEE/CAA Journal of Automatica Sinica, 9(12):2121–2137, 2022.
  24. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, 82:28–42, 2022.
  25. Piafusion: A progressive infrared and visible image fusion network based on illumination aware. Information Fusion, 83:79–92, 2022.
  26. Rethinking the necessity of image fusion in high-level vision tasks: A practical infrared and visible image fusion network based on progressive semantic injection and scene fidelity. Information Fusion, page 101870, 2023.
  27. Progress in color night vision. Optical Engineering, 51(1):010901–010901, 2012.
  28. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9653–9663, 2022.
  29. U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020.
  30. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 12484–12491, 2020.
  31. Classification saliency-based rule for visible and infrared image fusion. IEEE Transactions on Computational Imaging, 7:824–836, 2021.
  32. Vifb: A visible and infrared image fusion benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 104–105, 2020.
  33. Didfuse: deep image decomposition for infrared and visible image fusion. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 976–976, 2021.
  34. Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13955–13965, 2023.
  35. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5906–5916, 2023.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.