Papers
Topics
Authors
Recent
Search
2000 character limit reached

CoMIR: Contrastive Multimodal Image Representation for Registration

Published 11 Jun 2020 in cs.CV, cs.LG, and eess.IV | (2006.06325v2)

Abstract: We propose contrastive coding to learn shared, dense image representations, referred to as CoMIRs (Contrastive Multimodal Image Representations). CoMIRs enable the registration of multimodal images where existing registration methods often fail due to a lack of sufficiently similar image structures. CoMIRs reduce the multimodal registration problem to a monomodal one, in which general intensity-based, as well as feature-based, registration algorithms can be applied. The method involves training one neural network per modality on aligned images, using a contrastive loss based on noise-contrastive estimation (InfoNCE). Unlike other contrastive coding methods, used for, e.g., classification, our approach generates image-like representations that contain the information shared between modalities. We introduce a novel, hyperparameter-free modification to InfoNCE, to enforce rotational equivariance of the learnt representations, a property essential to the registration task. We assess the extent of achieved rotational equivariance and the stability of the representations with respect to weight initialization, training set, and hyperparameter settings, on a remote sensing dataset of RGB and near-infrared images. We evaluate the learnt representations through registration of a biomedical dataset of bright-field and second-harmonic generation microscopy images; two modalities with very little apparent correlation. The proposed approach based on CoMIRs significantly outperforms registration of representations created by GAN-based image-to-image translation, as well as a state-of-the-art, application-specific method which takes additional knowledge about the data into account. Code is available at: https://github.com/MIDA-group/CoMIR.

Citations (73)

Summary

  • The paper introduces CoMIR, a contrastive learning approach that transforms multimodal image registration into simpler monomodal tasks.
  • It integrates rotational equivariance via symmetry constraints in the loss function, resulting in improved registration accuracy and reduced computational overhead.
  • Empirical evaluations on aerial and biomedical datasets demonstrate significant advantages over traditional MI techniques and GAN-based methods.

CoMIR: Contrastive Multimodal Image Representation for Registration

The paper "CoMIR: Contrastive Multimodal Image Representation for Registration" introduces a novel approach to solve multimodal image registration challenges by transforming them into simpler monomodal registration tasks using contrastive learning. The key innovation of the method is in generating image-like representations known as CoMIRs, which maintain the mutual information across different imaging modalities, thereby facilitating their alignment.

Methodology

Theoretical Framework

The approach utilizes a contrastive loss based on noise-contrastive estimation (InfoNCE), aimed at maximizing mutual information (MI) between input and output representations. This novel modification of InfoNCE is designed to enforce rotational equivariance in the learned representations, a crucial property for effective image registration tasks.

A significant theoretical contribution is demonstrating that contrastive learning can produce dense representations across very different imaging modalities that can be leveraged by monomodal registration methods. This contrasts with prior methods focused on predicting cross-modal representations, such as those based on GANs, which often resulted in less stable outcomes.

Implementation Details

The CoMIRs are generated using neural networks trained separately on each modality, with the rotational equivariance being integrated directly into the loss function using symmetry group constraints like C4\mathcal{C}_4 for rotations at multiples of 90 degrees. This integration avoids parameter tuning and architecture modifications typically needed for achieving equivariant characteristics.

Critical to the implementation is the design of the sampling scheme for negative samples, ensuring a diverse and robust dataset that enhances model generalization.

Critic Function Choice

The choice of the critic function was explored extensively, considering models based on mean squared error (MSE) and cosine similarity. Empirical analysis suggested MSE-based critics yielded more favorable results for registration tasks due to better alignment in intensity values of the generated CoMIRs.

Experiments and Evaluation

Datasets

Experiments were conducted on two distinct multimodal datasets:

  1. Zurich Dataset: Comprising RGB and near-infrared (NIR) aerial images.
  2. Biomedical Dataset: Consisting of bright-field (BF) and second-harmonic generation (SHG) imaging of breast tissue microarray cores.

The method showed superiority in registering images from these datasets compared to GAN-based image translations and state-of-the-art methods tailored to specific domains. Figure 1

Figure 1: Registration of images of different modalities (BF and SHG) using CoMIR to enable successful registration by monomodal approaches.

Performance Metrics

The paper highlighted substantial improvements in registration success rates and accuracy using CoMIRs over conventional MI-based and CurveAlign methodologies. The method effectively reduces computational demands, offering faster training and inference stages, with a significant reduction in registration time compared to traditional methods. Figure 2

Figure 2: eCDF of the successful registrations comparing different methods over the biomedical test set.

Implications and Future Research

Practical Applications

The CoMIR method is poised to transform multimodal image registration processes across various fields, including remote sensing, biomedical imaging, and material science. It presents an efficient way to handle complex registration tasks with reduced computational overhead.

Theoretical Insights

The paper paves the way for future studies on the application of CoMIRs to other tasks such as classification and segmentation, suggesting extensions to other symmetry groups beyond rotations.

Future work may involve integrating aleatoric uncertainty into CoMIR representations, exploring applications across higher-dimensional datasets, and improving generalized equivariance properties.

Conclusion

This paper demonstrates an innovative approach to multimodal image registration through contrastive learning, achieving state-of-the-art performance in simplifying complex registration tasks. Its insights on mutual information estimation and equivariance properties set a new benchmark for multimodal image analysis, with substantial implications for efficient and robust image processing methodologies. Innovative uses in clinical, remote sensing, and industrial applications are anticipated, bringing significant practical advancement in the field.

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.