Unsupervised 3D End-to-End Medical Image Registration with Volume Tweening Network (1902.05020v3)

Published 13 Feb 2019 in cs.CV

Abstract: 3D medical image registration is of great clinical importance. However, supervised learning methods require a large amount of accurately annotated corresponding control points (or morphing), which are very difficult to obtain. Unsupervised learning methods ease the burden of manual annotation by exploiting unlabeled data without supervision. In this paper, we propose a new unsupervised learning method using convolutional neural networks under an end-to-end framework, Volume Tweening Network (VTN), for 3D medical image registration. We propose three innovative technical components: (1) An end-to-end cascading scheme that resolves large displacement; (2) An efficient integration of affine registration network; and (3) An additional invertibility loss that encourages backward consistency. Experiments demonstrate that our algorithm is 880x faster (or 3.3x faster without GPU acceleration) than traditional optimization-based methods and achieves state-of-theart performance in medical image registration.

Citations (212)

View on Semantic Scholar

Summary

The paper presents a Volume Tweening Network that refines 3D medical image registration using cascading subnetworks for precise alignment without iterative optimization.
It integrates an affine registration subnetwork and employs an invertibility loss to enhance transformation quality and maintain backward consistency.
The framework achieves up to 880 times faster performance on GPU while ensuring robust accuracy across diverse datasets like liver CT and brain MRI.

Unsupervised 3D Medical Image Registration with Volume Tweening Network

The paper presents an advanced framework for unsupervised 3D medical image registration using a Volume Tweening Network (VTN), employing convolutional neural networks in an end-to-end manner. This paper addresses the prevalent issue in traditional methods that rely heavily on supervised learning with annotated datasets, which are often cumbersome and impractical to obtain for medical images. As an alternative, the VTN employs unsupervised learning, significantly reducing the necessity for manual annotations while maintaining robust registration accuracy and efficiency.

Volume Tweening Network and Methodology

The VTN distinguishes itself through three pivotal innovations:

Cascading Subnetworks: The framework integrates multiple registration subnetworks, where each subnetwork is tasked with aligning the fixed image and the sequence of progressively warped moving images. This hierarchical structure refines the alignment in cases of large displacements, enhancing the registration's precision without classical iterative optimization approaches.
Integrated Affine Registration: Unlike other approaches that necessitate external affine registration steps, VTN incorporates an affine registration subnetwork within its pipeline. This integration promises operational gains by simplifying the overall execution process and enhancing the transformation's quality through joint optimization.
Invertibility Loss: VTN introduces an additional invertibility loss, aiming to ensure backward consistency of transformations—when any voxel in an image is mapped forward and then reversed, it should ideally return to its original position. This technique enforces more coherent alignments between the image pairs.

Numerical Results and Evaluation

The numerical performance of this methodology is compelling: VTN significantly outperforms traditional methods in terms of speed, showing up to 880 times faster performance with GPU acceleration, without sacrificing accuracy. The effective performance persists across various datasets, including liver CT scans and brain MRI datasets. Notably, in the liver CT dataset, VTN achieves superior landmark distances and Jaccard coefficients over established methods like ANTs and Elastix. While in brain MRI datasets, VTN's performance remains competitive, although ANTs retains a slight edge in some metrics.

Critically, VTN's capability to utilize additional unlabeled data effectively showcases its potential scalability, which is practical for healthcare systems holding vast unannotated medical data repositories. As demonstrated in the paper, expanding training datasets consistently improves VTN's accuracy, especially in large-displacement scenarios prevalent in certain medical conditions or imaging techniques.

Implications and Future Directions

This research holds significant theoretical and practical implications. Theoretically, it advances the discussion regarding unsupervised learning in the field of medical imaging, providing a concrete model exhibiting benefits from large amounts of unannotated data. Practically, the VTN holds potential for integration into clinical workflows, particularly in time-sensitive diagnostic situations where traditional methods are impractical due to computational expenses.

Future research may explore further optimization of the VTN framework, perhaps through hyperparameter tuning or exploring alternative similarity loss functions to improve accuracy further in datasets with minimal displacement. Moreover, investigating the adaptiveness of VTN for other imaging modalities, such as PET or ultrasound, could broaden its applicability. Additionally, there is scope for applying this model to other domains where large amounts of unlabeled data are available, thus reinforcing the idea that unsupervised learning models like VTN can bridge the gap between data abundance and the need for precise computational models.

In conclusion, the paper presents a robust framework that stands as a testament to the growing capabilities of unsupervised learning paradigms, notably in tackling intricate tasks like 3D medical image registration, with significant advancements in speed and operational integration.

PDF Markdown