Vector Field Attention for Deformable Image Registration (2407.10209v1)
Abstract: Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their feature maps and predict displacement or deformation fields though convolutional or fully connected layers from these high-dimensional feature maps. In this work, we present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences. VFA uses neural networks to extract multi-resolution feature maps from the fixed and moving images and then retrieves pixel-level correspondences based on feature similarity. The retrieval is achieved with a novel attention module without the need of learnable parameters. VFA is trained end-to-end in either a supervised or unsupervised manner. We evaluated VFA for intra- and inter-modality registration and for unsupervised and semi-supervised registration using public datasets, and we also evaluated it on the Learn2Reg challenge. Experimental results demonstrate the superior performance of VFA compared to existing methods. The source code of VFA is publicly available at https://github.com/yihao6/vfa/.
- M. F. Beg, M. I. Miller, A. Trouvé, et al., “Computing large deformation metric mappings via geodesic flows of diffeomorphisms,” International Journal of Computer Vision 61, 139–157 (2005).
- B. B. Avants, C. L. Epstein, M. Grossman, et al., “Symmetric diffeomorphic image registration with cross-correlation: evaluating automated labeling of elderly and neurodegenerative brain,” Medical Image Analysis 12(1), 26–41 (2008).
- S. Klein, M. Staring, K. Murphy, et al., “Elastix: A toolbox for intensity-based medical image registration,” IEEE Trans. Med. Imag. 29(1), 196–205 (2009).
- J. Chen, Y. Liu, S. Wei, et al., “A survey on deep learning in medical image registration: New technologies, uncertainty, evaluation metrics, and beyond,” arXiv preprint arXiv:2307.15615 2307.15615 (2023).
- M. Jaderberg, K. Simonyan, A. Zisserman, et al., “Spatial transformer networks,” Advances in Neural Information Processing Systems 28 (2015).
- B. de Vos, F. Berendsen, M. Viergever, et al., “End-to-end unsupervised deformable image registration with a convolutional neural network,” in Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. DLMIA ML-CDS 2017, 204–212, Springer (2017).
- G. Balakrishnan, A. Zhao, M. R. Sabuncu, et al., “Voxelmorph: a learning framework for deformable medical image registration,” IEEE Trans. Med. Imag. 38(8), 1788–1800 (2019).
- J. Chen, E. C. Frey, Y. He, et al., “Transmorph: Transformer for unsupervised medical image registration,” Medical Image Analysis 82, 102615 (2022).
- J. Chen, Y. He, E. Frey, et al., “ViT-V-Net: Vision Transformer for Unsupervised Volumetric Medical Image Registration,” in Medical Imaging with Deep Learning, (2021).
- J.-P. Thirion, “Image matching as a diffusion process: an analogy with Maxwell’s demons,” Medical Image Analysis 2(3), 243–260 (1998).
- D. Shen and C. Davatzikos, “HAMMER: Hierarchical attribute matching mechanism for elastic registration,” IEEE Trans. Med. Imag. 21(11), 1421–1439 (2002).
- Y. Liu, L. Zuo, S. Han, et al., “Coordinate translator for learning deformable medical image registration,” in International Workshop on Multiscale Multimodal Medical Imaging, 98–109, Springer (2022).
- A. Hering, L. Hansen, T. C. W. Mok, et al., “Learn2Reg: comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning,” IEEE Trans. Med. Imag. 42(3), 697–712 (2022).
- W. R. Crum, O. Camara, and D. J. Hawkes, “Methods for inverting dense displacement fields: Evaluation in brain image registration,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2007, 4791, 900–907, Springer (2007).
- X. Zhuang, K. Rhode, S. Arridge, et al., “An atlas-based segmentation propagation framework using locally affine registration—application to automatic whole heart segmentation,” in Medical Image Computing and Computer-Assisted Intervention—MICCAI 2008, 5242, 425–433, Springer (2008).
- X. Yang, R. Kwitt, M. Styner, et al., “Quicksilver: Fast predictive image registration–a deep learning approach,” NeuroImage 158, 378–396 (2017).
- M.-M. Rohé, M. Datar, T. Heimann, et al., “SVF-Net: learning deformable image registration using shape matching,” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2017, 10433, 266–274, Springer (2017).
- S. Miao, Z. J. Wang, and R. Liao, “A CNN regression approach for real-time 2D/3D registration,” IEEE Trans. Med. Imag. 35(5), 1352–1363 (2016).
- K. A. J. Eppenhof and J. P. W. Pluim, “Pulmonary CT registration through supervised learning with convolutional neural networks,” IEEE Trans. Med. Imag. 38(5), 1097–1105 (2018).
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, 9351, 234–241, Springer (2015).
- X. Jia, J. Bartlett, T. Zhang, et al., “U-Net vs transformer: Is U-Net outdated in medical image registration?,” in Machine Learning in Medical Imaging. MLMI 2022., 13583, 151–160, Springer (2022).
- M. P. Heinrich, “Closing the gap between deep and conventional image registration using probabilistic dense displacement networks,” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2019, 11769, 50–58, Springer (2019).
- M. P. Heinrich and L. Hansen, “Voxelmorph++ going beyond the cranial vault with keypoint supervision and multi-channel instance optimisation,” in Biomedical Image Registration: 10th International Workshop, WBIR 2022, Munich, Germany, July 10–12, 2022, Proceedings, 85–95, Springer (2022).
- X. Song, H. Chao, X. Xu, et al., “Cross-modal attention for multi-modal image registration,” Medical Image Analysis 82, 102612 (2022).
- J. Shi, Y. He, Y. Kong, et al., “Xmorpher: Full transformer for deformable medical image registration via cross attention,” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, 13436, 217–226, Springer (2022).
- J. Chen, Y. Liu, Y. He, et al., “Deformable cross-attention transformer for medical image registration,” in International Workshop on Machine Learning in Medical Imaging, 14348, 115–125 (2023).
- J. Chen, D. Lu, Y. Zhang, et al., “Deformer: Towards displacement field learning for unsupervised medical image registration,” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2022, 13436, 141–151, Springer (2022).
- H. Wang, D. Ni, and Y. Wang, “ModeT: Learning Deformable Image Registration via Motion Decomposition Transformer,” in 26rdrd{}^{\mbox{\tiny{rd}}}start_FLOATSUPERSCRIPT rd end_FLOATSUPERSCRIPT International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2023), 740–749, Springer (2023).
- M. P. Heinrich, “Closing the gap between deep and conventional image registration using probabilistic dense displacement networks,” in 22ndnd{}^{\mbox{\tiny{nd}}}start_FLOATSUPERSCRIPT nd end_FLOATSUPERSCRIPT International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2019), Lecture Notes in Computer Science 11769, 50–58 (2019).
- H. Xu and J. Zhang, “AANet: Adaptive aggregation network for efficient stereo matching,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1959–1968 (2020).
- S. Zhao, Y. Sheng, Y. Dong, et al., “MaskFlownet: Asymmetric feature matching with learnable occlusion mask,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 6278–6287 (2020).
- Z. Chen, Y. Zheng, and J. C. Gee, “TransMatch: a transformer-based multilevel dual-stream feature matching network for unsupervised deformable image registration,” IEEE Trans. Med. Imag. 43(1), 15–27 (2023).
- L. Hansen and M. P. Heinrich, “GraphRegNet: Deep graph regularisation networks on sparse keypoints for dense registration of 3D lung CTs,” IEEE Trans. Med. Imag. 40(9), 2246–2257 (2021).
- M. Kang, X. Hu, W. Huang, et al., “Dual-stream pyramid registration network,” Medical Image Analysis 78, 102379 (2022).
- D. Sun, X. Yang, M.-L. Liu, et al., “PWC-Net: CNNs for optical flow using pyramid, warping, and cost volume,” in 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 8934–8943 (2018).
- P. J. LaMontagne, T. L. S. Benzinger, J. C. Morris, et al., “OASIS-3: Longitudinal neuroimaging, clinical, and cognitive dataset for normal aging and Alzheimer disease,” MedRxiv 2019.12.13.19014902, 2019–12 (2019).
- National Lung Screening Trial Research Team, “Reduced lung-cancer mortality with low-dose computed tomographic screening,” New England Journal of Medicine 365(5), 395–409 (2011).
- Y. Huo, Z. Xu, Y. Xiong, et al., “3D whole brain segmentation using spatially localized atlas network tiles,” NeuroImage 194, 105–119 (2019).
- Y. Liu, J. Chen, S. Wei, et al., “On finite difference Jacobian computation in deformable image registration,” International Journal of Computer Vision , 1–11 (2024).
- A. V. Dalca, J. Guttag, and M. R. Sabuncu, “Anatomical priors in convolutional networks for unsupervised biomedical segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9290–9299 (2018).
- A. V. Dalca, G. Balakrishnan, J. Guttag, et al., “Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces,” Medical Image Analysis 57, 226–236 (2019).
- V. S. Fonov, A. C. Evans, R. C. McKinstry, et al., “Unbiased nonlinear average age-appropriate brain templates from birth to adulthood,” NeuroImage 47, S102 (2009).
- Biomedical Image Analysis Group, “IXI Brain Development Dataset.” https://brain-development.org/ixi-dataset/ (2007).
- N. J. Tustison, B. B. Avants, P. A. Cook, et al., “N4ITK: improved N3 bias correction,” IEEE Trans. Med. Imag. 29(6), 1310–1320 (2010).
- J. C. Reinhold, B. E. Dewey, A. Carass, et al., “Evaluating the impact of intensity normalization on MR image synthesis,” in Medical Imaging 2019: Image Processing, 10949, 109493H, International Society for Optics and Photonics (2019).
- D. S. Kerby, “The simple difference formula: An approach to teaching nonparametric correlation,” Comprehensive Psychology 3, 1–9 (2014).
- K. O. McGraw and S. P. Wong, “A common language effect size statistic.,” Psychological Bulletin 111(2), 361–365 (1992).
- C. K. Guo, Multi-modal image registration with unsupervised deep learning. PhD thesis, Massachusetts Institute of Technology (2019).
- J. Lv, Z. Wang, H. Shi, et al., “Joint progressive and coarse-to-fine registration of brain mri via deformation field integration and non-rigid feature fusion,” IEEE Trans. Med. Imag. 41(10), 2788–2802 (2022).
- H. Siebert, L. Hansen, and M. P. Heinrich, “Fast 3D registration with accurate optimisation and little learning for Learn2Reg 2021,” in Biomedical Image Registration, Domain Generalisation and Out-of-Distribution Analysis. MICCAI 2021, 13166, 174–179, Springer (2022).
- T. C. W. Mok and A. C. S. Chung, “Large deformation diffeomorphic image registration with laplacian pyramid networks,” in Medical Image Computing and Computer Assisted Intervention—MICCAI 2020, 211–221, Springer (2020).
- M. P. Heinrich, H. Handels, and I. J. A. Simpson, “Estimating large lung motion in COPD patients by symmetric regularised correspondence fields,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, 9350, 338–345, Springer (2015).
- N. Kitaev, Ł. Kaiser, and A. Levskaya, “Reformer: The efficient transformer,” in International Conference on Learning Representations, (2020).
- Z. Xia, X. Pan, S. Song, et al., “Vision transformer with deformable attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4794–4803 (2022).
- Z. Bian, F. Xing, J. Yu, et al., “Deep Unsupervised Phase-based 3D Incompressible Motion Estimation in Tagged-MRI,” in 6thth{}^{\mbox{\tiny{th}}}start_FLOATSUPERSCRIPT th end_FLOATSUPERSCRIPT International Conference on Medical Imaging with Deep Learning (MIDL 2023), (2023).
- J. Yu, M. Shao, Z. Bian, et al., “New starting point registration method for tagged MRI tongue motion estimation,” in Proceedings of SPIE Medical Imaging (SPIE-MI 2023), San Diego, CA, February 19 – 23, 2023, 1246429 (2023).