CoMIR: Contrastive Multimodal Image Representation for Registration (2006.06325v2)

Published 11 Jun 2020 in cs.CV, cs.LG, and eess.IV

Abstract: We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one, in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for, e.g., classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: https://github.com/MIDA-group/CoMIR.

Citations (73)

View on Semantic Scholar

Summary

The paper presents a contrastive learning method that creates dense, rotation-equivariant image representations to simplify multimodal registration.
It demonstrates that CoMIR outperforms GAN-based and traditional methods with higher registration accuracy on biomedical and remote sensing datasets.
The paper shows robust performance across varied training conditions while reducing hyperparameter tuning by converting registration into a monomodal process.

Analysis of CoMIR: Contrastive Multimodal Image Representation for Registration

The paper "CoMIR: Contrastive Multimodal Image Representation for Registration" presents a novel method for multimodal image registration that leverages contrastive learning to create dense image representations called Contrastive Multimodal Image Representations (CoMIRs). This method aims to simplify the complex task of multimodal image registration by converting it into a monomodal problem, thereby enabling the use of traditional registration techniques that have been successful in monomodal contexts.

Methodology Overview

The predominant challenge in multimodal image registration lies in aligning images from different modalities, which often possess non-correlated appearances. CoMIR addresses this by adopting a supervised contrastive learning approach that learns a shared representation of the multimodal images. The algorithm involves training a neural network per modality with a novel modification to the InfoNCE loss function, which enforces rotational equivariance without additional hyperparameters.

Key steps in the proposed method include:

Training neural networks on aligned images from different modalities.
Utilizing a contrastive loss function to generate dense image-like representations that retain rotational equivariance.
Implementing a hyperparameter-free modification to the InfoNCE loss, crucially enhancing the rotational stability of the learned representations.
Evaluating the learned CoMIRs through traditional monomodal registration methods, namely $\alpha$ -AMD and SIFT.

Experimental Evaluation

The authors tested CoMIRs on challenging datasets, including a remote sensing dataset containing RGB and near-infrared images, and a biomedical dataset of bright-field (BF) and second-harmonic generation (SHG) microscopy images. The following insights were yielded from the experiments:

Numerical Performance: CoMIR-based registration considerably outperformed both generative adversarial network (GAN)-based image-to-image translations and application-specific methods. For instance, using $\alpha$ -AMD achieved more successful registrations (significantly higher percentage of sub-$9$px and sub-$42$px errors) compared to alternative approaches such as mutual information (MI) and the CurveAlign method.
Generalizability and Stability: The method demonstrated robustness across varying training conditions, showing stable performance irrespective of network initialization and dataset size. This stability is enhanced by the investigative insight that even a single image pair can suffice for effective CoMIR training.
Rotational Equivariance: The enforced rotational equivariance in CoMIRs was shown to improve registration accuracy by ensuring consistency of features under rotation, which is a frequent necessity in practical applications.

Implications and Future Directions

The proposed method holds significant practical implications, particularly in fields requiring multimodal image analysis like medical imaging, remote sensing, and material science. The innovative use of contrastive learning aligns well with contemporary trends to develop end-to-end trainable systems in vision tasks. Additionally, this work stands out by demonstrating computational efficiency and reduced training overhead, potentially encouraging broader adoption in domains where computational resources are constrained.

Looking forward, the authors identify several future research avenues, such as extending CoMIRs to volumetric data, enhancing modality diversity, and exploring applications in tasks beyond registration, like segmentation and pixel-wise regression. The potential introduction of uncertainty quantification to CoMIRs also presents a meaningful direction for increasing prediction reliability in critical applications.

Conclusion

The research compellingly showcases the potential of contrastive learning for advancing multimodal image registration. By transforming traditionally complex multimodal registration into a feasible monomodal process, CoMIR paves the way for more effective and applicable solutions in a range of scientific and real-world domains. The method’s capability to produce dense, rotation-equivariant representations without the need for extensive hyperparameter tuning is indicative of its practicality and adaptability, making it a noteworthy contribution to the field of image processing and machine learning.

PDF Markdown

Related Papers

GitHub

GitHub - MIDA-group/CoMIR: Framework for joint representation learning, evaluation through multimodal registration and comparison with image translation based approaches (74 stars)