Task-Customized Mixture of Adapters for General Image Fusion (2403.12494v2)
Abstract: General image fusion aims at integrating important information from multi-source images. However, due to the significant cross-task gap, the respective fusion mechanism varies considerably in practice, resulting in limited performance across subtasks. To handle this problem, we propose a novel task-customized mixture of adapters (TC-MoA) for general image fusion, adaptively prompting various fusion tasks in a unified model. We borrow the insight from the mixture of experts (MoE), taking the experts as efficient tuning adapters to prompt a pre-trained foundation model. These adapters are shared across different tasks and constrained by mutual information regularization, ensuring compatibility with different tasks while complementarity for multi-source images. The task-specific routing networks customize these adapters to extract task-specific information from different sources with dynamic dominant intensity, performing adaptive visual feature prompt fusion. Notably, our TC-MoA controls the dominant intensity bias for different fusion tasks, successfully unifying multiple fusion tasks in a single model. Extensive experiments show that TC-MoA outperforms the competing approaches in learning commonalities while retaining compatibility for general image fusion (multi-modal, multi-exposure, and multi-focus), and also demonstrating striking controllability on more generalization experiments. The code is available at https://github.com/YangSun22/TC-MoA .
- Aanlib. Ihttp://www.med.harvard.edu/AANLIB/home.html, (Accessed 5 January 2020).
- Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4):2049–2062, 2018.
- Multi-modal gated mixture of local-to-global experts for dynamic image fusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23555–23564, 2023.
- Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022.
- Mod-squad: Designing mixtures of experts as modular multi-task learners. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11828–11837, 2023.
- A similarity metric for assessment of image fusion algorithms. International journal of signal processing, 2(3):178–182, 2005.
- Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration. In International Joint Conference on Artificial Intelligence (IJCAI), 2022.
- Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. The Journal of Machine Learning Research, 23(1):5232–5270, 2022.
- Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Visual prompt tuning. In European Conference on Computer Vision, pages 709–727. Springer, 2022.
- Llvip: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3496–3504, 2021.
- An unsupervised multi-focus image fusion method based on transformer and u-net. IET Image Processing, 17(3):733–746, 2023.
- Image fusion techniques: a survey. Archives of computational methods in Engineering, 28:4425–4447, 2021.
- Gshard: Scaling giant models with conditional computation and automatic sharding. CoRR, abs/2006.16668, 2020.
- Densefuse: A fusion approach to infrared and visible images. IEEE Transactions on Image Processing, 28(5):2614–2623, 2018.
- Rfn-nest: An end-to-end residual fusion network for infrared and visible images. Information Fusion, 73:72–86, 2021.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Scaling & shifting your features: A new baseline for efficient model tuning. Advances in Neural Information Processing Systems, 35:109–123, 2022.
- Fusion from decomposition: A self-supervised decomposition approach for image fusion. In European Conference on Computer Vision, pages 719–735. Springer, 2022.
- Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5802–5811, 2022.
- Multi-focus image fusion with a deep convolutional neural network. Information Fusion, 36:191–207, 2017.
- Multi-focus image fusion: A survey of the state of the art. Information Fusion, 64:71–91, 2020.
- Modeling task relationships in multi-task learning with multi-gate mixture-of-experts. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pages 1930–1939, 2018.
- Infrared and visible image fusion methods and applications: A survey. Information fusion, 45:153–178, 2019a.
- Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer. IEEE/CAA Journal of Automatica Sinica, 9(7):1200–1217, 2022.
- Perceptual quality assessment for multi-exposure image fusion. IEEE Transactions on Image Processing, 24(11):3345–3356, 2015.
- Deep guided learning for fast multi-exposure image fusion. IEEE Transactions on Image Processing, 29:2808–2819, 2019b.
- Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning. In Proceedings of the AAAI conference on artificial intelligence, pages 2126–2134, 2022.
- Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs. In Proceedings of the IEEE international conference on computer vision, pages 4714–4722, 2017.
- Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30, 2017.
- Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
- Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Information Fusion, 82:28–42, 2022a.
- Ydtr: Infrared and visible image fusion via y-shape dynamic transformer. IEEE Transactions on Multimedia, 2022b.
- Multiscale structural similarity for image quality assessment. In The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003, pages 1398–1402 Vol.2, 2003.
- Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4):600–612, 2004.
- Swinfuse: A residual swin transformer fusion network for infrared and visible images. IEEE Transactions on Instrumentation and Measurement, 71:1–12, 2022.
- Multi-exposure image fusion techniques: A comprehensive review. Remote Sensing, 14(3):771, 2022.
- U2fusion: A unified unsupervised image fusion network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(1):502–518, 2020a.
- Fusiondn: A unified densely connected network for image fusion. In Proceedings of the AAAI conference on artificial intelligence, pages 12484–12491, 2020b.
- Mef-gan: Multi-exposure image fusion via generative adversarial networks. IEEE Transactions on Image Processing, 29:7203–7216, 2020c.
- Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion. International Journal of Computer Vision, 129:2761–2785, 2021.
- Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity. In Proceedings of the AAAI conference on artificial intelligence, pages 12797–12804, 2020a.
- Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion. Information Fusion, 66:40–53, 2021a.
- Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion. Information Fusion, 66:40–53, 2021b.
- Image fusion meets deep learning: A survey and perspective. Information Fusion, 76:323–336, 2021c.
- Real-mff: A large realistic multi-focus image dataset with ground truth. Pattern Recognition Letters, 138:370–377, 2020b.
- Xingchen Zhang. Benchmarking and comparing multi-exposure image fusion algorithms. Information Fusion, 74:111–131, 2021a.
- Xingchen Zhang. Deep learning-based multi-focus image fusion: A survey and a comparative study. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021b.
- Ifcnn: A general image fusion framework based on convolutional neural network. Information Fusion, 54:99–118, 2020c.
- Didfuse: Deep image decomposition for infrared and visible image fusion. arXiv preprint arXiv:2003.09210, 2020.
- Efficient and model-based infrared and visible image fusion via algorithm unrolling. IEEE Transactions on Circuits and Systems for Video Technology, 32(3):1186–1196, 2021.
- Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5906–5916, 2023a.
- Ddfm: denoising diffusion model for multi-modality image fusion. arXiv preprint arXiv:2303.06840, 2023b.
- Pengfei Zhu (76 papers)
- Yang Sun (145 papers)
- Bing Cao (23 papers)
- Qinghua Hu (83 papers)