Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Task-Customized Mixture of Adapters for General Image Fusion (2403.12494v2)

Published 19 Mar 2024 in cs.CV

Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Aanlib. Ihttp://www.med.harvard.edu/AANLIB/home.html, (Accessed 5 January 2020).
  2. Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4):2049–2062, 2018.
  3. Multi-modal gated mixture of local-to-global experts for dynamic image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23555–23564, 2023.
  4. Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022.
  5. Mod-squad: Designing mixtures of experts as modular multi-task learners. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11828–11837, 2023.
  6. A similarity metric for assessment of image fusion algorithms. International journal of signal processing, 2(3):178–182, 2005.
  7. Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. In International Joint Conference on Artificial Intelligence (IJCAI), 2022.
  8. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
  9. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  10. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  11. Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
  12. Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3496–3504, 2021.
  13. An unsupervised multi-focus image fusion method based on transformer and u-net. IET Image Processing, 17(3):733–746, 2023.
  14. Image fusion techniques: a survey. Archives of computational methods in Engineering, 28:4425–4447, 2021.
  15. Gshard: Scaling giant models with conditional computation and automatic sharding. CoRR, abs/2006.16668, 2020.
  16. Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
  17. Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73:72–86, 2021.
  18. Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
  19. Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
  20. Fusion from decomposition: A self-supervised decomposition approach for image fusion. In European Conference on Computer Vision, pages 719–735. Springer, 2022.
  21. Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022.
  22. Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36:191–207, 2017.
  23. Multi-focus image fusion: A survey of the state of the art. Information Fusion, 64:71–91, 2020.
  24. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1930–1939, 2018.
  25. Infrared and visible image fusion methods and applications: A survey. Information fusion, 45:153–178, 2019a.
  26. Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7):1200–1217, 2022.
  27. Perceptual quality assessment for multi-exposure image fusion. IEEE Transactions on Image Processing, 24(11):3345–3356, 2015.
  28. Deep guided learning for fast multi-exposure image fusion. IEEE Transactions on Image Processing, 29:2808–2819, 2019b.
  29. Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. In Proceedings of the AAAI conference on artificial intelligence, pages 2126–2134, 2022.
  30. Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE international conference on computer vision, pages 4714–4722, 2017.
  31. Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017.
  32. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
  33. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, 82:28–42, 2022a.
  34. Ydtr: Infrared and visible image fusion via y-shape dynamic transformer. IEEE Transactions on Multimedia, 2022b.
  35. Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, pages 1398–1402 Vol.2, 2003.
  36. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
  37. Swinfuse: A residual swin transformer fusion network for infrared and visible images. IEEE Transactions on Instrumentation and Measurement, 71:1–12, 2022.
  38. Multi-exposure image fusion techniques: A comprehensive review. Remote Sensing, 14(3):771, 2022.
  39. U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020a.
  40. Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI conference on artificial intelligence, pages 12484–12491, 2020b.
  41. Mef-gan: Multi-exposure image fusion via generative adversarial networks. IEEE Transactions on Image Processing, 29:7203–7216, 2020c.
  42. Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. International Journal of Computer Vision, 129:2761–2785, 2021.
  43. Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI conference on artificial intelligence, pages 12797–12804, 2020a.
  44. Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion. Information Fusion, 66:40–53, 2021a.
  45. Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion. Information Fusion, 66:40–53, 2021b.
  46. Image fusion meets deep learning: A survey and perspective. Information Fusion, 76:323–336, 2021c.
  47. Real-mff: A large realistic multi-focus image dataset with ground truth. Pattern Recognition Letters, 138:370–377, 2020b.
  48. Xingchen Zhang. Benchmarking and comparing multi-exposure image fusion algorithms. Information Fusion, 74:111–131, 2021a.
  49. Xingchen Zhang. Deep learning-based multi-focus image fusion: A survey and a comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021b.
  50. Ifcnn: A general image fusion framework based on convolutional neural network. Information Fusion, 54:99–118, 2020c.
  51. Didfuse: Deep image decomposition for infrared and visible image fusion. arXiv preprint arXiv:2003.09210, 2020.
  52. Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1186–1196, 2021.
  53. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5906–5916, 2023a.
  54. Ddfm: denoising diffusion model for multi-modality image fusion. arXiv preprint arXiv:2303.06840, 2023b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pengfei Zhu (76 papers)
  2. Yang Sun (145 papers)
  3. Bing Cao (23 papers)
  4. Qinghua Hu (83 papers)
Citations (9)