U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration? (2208.04939v2)

Published 7 Aug 2022 in eess.IV and cs.CV

Abstract: Due to their extreme long-range modeling capability, vision transformer-based networks have become increasingly popular in deformable image registration. We believe, however, that the receptive field of a 5-layer convolutional U-Net is sufficient to capture accurate deformations without needing long-range dependencies. The purpose of this study is therefore to investigate whether U-Net-based methods are outdated compared to modern transformer-based approaches when applied to medical image registration. For this, we propose a large kernel U-Net (LKU-Net) by embedding a parallel convolutional block to a vanilla U-Net in order to enhance the effective receptive field. On the public 3D IXI brain dataset for atlas-based registration, we show that the performance of the vanilla U-Net is already comparable with that of state-of-the-art transformer-based networks (such as TransMorph), and that the proposed LKU-Net outperforms TransMorph by using only 1.12% of its parameters and 10.8% of its mult-adds operations. We further evaluate LKU-Net on a MICCAI Learn2Reg 2021 challenge dataset for inter-subject registration, our LKU-Net also outperforms TransMorph on this dataset and ranks first on the public leaderboard as of the submission of this work. With only modest modifications to the vanilla U-Net, we show that U-Net can outperform transformer-based architectures on inter-subject and atlas-based 3D medical image registration. Code is available at https://github.com/xi-jia/LKU-Net.

Citations (51)

View on Semantic Scholar

Summary

The paper presents LKU-Net, a modified U-Net with large kernel convolutions that enhances image registration performance.
It demonstrates LKU-Net achieves superior performance with only 1.12% of the parameters and 10.8% of the operations compared to transformer-based models.
The research advocates that optimized traditional architectures can rival or exceed complex transformers in medical imaging tasks.

U-Net vs Transformer: Evaluating Architectural Efficacy in Medical Image Registration

The paper entitled "U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration?" investigates the relative efficacy of U-Net and transformer-based architectures for medical image registration, a critical task in medical image analysis. The primary focus is to determine if the time-tested U-Net architecture remains competitive against contemporary transformer models, specifically within the scope of deformable image registration.

Overview of Methodology and Findings

The paper introduces a novel adaptation of the U-Net, named the Large Kernel U-Net (LKU-Net). By integrating large kernel convolutional blocks into the traditional U-Net architecture, the authors aim to enhance the effective receptive field without incurring the computational cost associated with long-range dependency modeling typically handled by transformers.

The authors begin by providing a comparative analysis using the 3D IXI brain dataset, showcasing that the vanilla U-Net competes closely with state-of-the-art transformer-based models such as TransMorph. Remarkably, LKU-Net achieves superior performance with only a fraction of the parameters and computational overhead—specifically, using only 1.12% of the parameters and 10.8% of the operations required by TransMorph. Furthermore, LKU-Net outstrips TransMorph in inter-subject registration on the Learn2Reg 2021 challenge dataset, achieving top ranking on the public leaderboard.

Technical Contributions

Design of LKU-Net: The LKU-Net architecture expands upon the vanilla U-Net by embedding large kernel convolutional blocks, which include a parallel convolutional structure to extend the receptive field. This is pivotal in capturing and mapping subtle deformations in medical images without resorting to computationally expensive transformers.
Efficiency in Parameter Utilization: LKU-Net demonstrates that slight architectural adjustments can yield performance gains while maintaining minimal parameter usage, thus highlighting the potential for optimized models in resource-constrained environments.
Diffeomorphic Variant: The paper introduces a diffeomorphic variant of LKU-Net (LKU-Net-diff), which outputs a stationary velocity field enabling diffeomorphic transformations. This ensures topological consistency in deformations, which is crucial for certain medical applications.

Implications and Future Directions

The paper highlights several implications for both the theoretical understanding and practical application of medical image registration methods:

Theoretical Insight: The findings challenge the necessity of transformers in specific contexts within medical imaging, suggesting that traditional architectures can be adapted to meet or exceed the performance of more complex models.
Practical Application: The reduced computational burden of LKU-Net offers significant advantages in clinical settings, enabling faster and more efficient processing that could facilitate real-time medical image analysis.
Future Developments: This paper suggests promising directions for future research, including extending LKU-Net for cross-modality registration tasks. Additionally, integrating further architectural innovations from both traditional and novel model paradigms could yield even greater efficiencies and accuracies.

The results engender a reevaluation of current trends that favor transformer architectures over simpler, established models like U-Net. By emphasizing efficiency and efficacy, the research advocates for a nuanced approach to model selection in medical imaging tasks, where updated classical methods may still hold substantial merit.

PDF Markdown

Related Papers

GitHub

GitHub - xi-jia/LKU-Net: The official implementation of U-Net vs Transformer: Is U-Net Outdated in Medical Image Registration? (74 stars)