Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors (2305.18274v2)

Published 29 May 2023 in cs.CV, cs.AI, and q-bio.NC

Abstract: We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye's performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. All code is available on GitHub.

Authors (12)

Paul S. Scotti (3 papers)
Atmadeep Banerjee (8 papers)
Jimmie Goode (1 paper)
Stepan Shabalin (6 papers)
Alex Nguyen (9 papers)
Ethan Cohen (7 papers)
Aidan J. Dempster (1 paper)
Nathalie Verlinde (1 paper)
Elad Yundler (1 paper)
David Weisberg (1 paper)
Kenneth A. Norman (9 papers)
Tanishq Mathew Abraham (6 papers)

Citations (78)

View on Semantic Scholar

Summary

fMRI-to-Image with Contrastive Learning and Diffusion Priors: The MindEye Approach

Introduction

The MindEye framework presents an innovative approach to reconstruct and retrieve images viewed by subjects, based on brain activity captured through functional magnetic resonance imaging (fMRI). This technique represents a significant advancement in the intersection of neuroscience and artificial intelligence, particularly in the domains of image reconstruction and neural decoding. The core of MindEye lies in its dual-module architecture, specifically designed to tackle both retrieval and reconstruction tasks by employing contrastive learning for the former and a diffusion prior-based model for the latter. Here, we delve into the operational intricacies of MindEye, its performance benchmarks against existing methods, and the implications of its findings.

MindEye Architecture

MindEye distinguishes itself through a dual-pipeline structure, facilitating both high-level (semantic content) and low-level (perceptual features) processing of brain activity for image reconstruction:

High-Level (Semantic) Pipeline: This pipeline is fundamentally tasked with mapping fMRI voxels to a multimodal latent space, notably the CLIP image space, enabling the reconstruction of images via generative models. A key component of this pipeline is a multilayer perceptron (MLP), which transforms the spatial patterns of fMRI activity into image embeddings. These embeddings are subjected to a contrastive learning process, producing disjointed CLIP fMRI embeddings. The reconstruction process is further refined using a diffusion prior, ensuring the embeddings are suitable for input into any pretrained image generation model.
Low-Level (Perceptual) Pipeline: The companion pipeline focuses on the preservation of low-level image features such as texture and color by mapping fMRI voxels to the latent space of a variational autoencoder (VAE), specifically that used by Stable Diffusion. The resultant reconstructions from this pipeline, while lacking in high-level semantic detail, exhibit high fidelity in low-level image characteristics.

Performance and Findings

MindEye has demonstrated remarkable efficacy in both retrieving precise images from large-scale databases (e.g., LAION-5B) based on brain activity and reconstructing images with high accuracy in semantic and perceptual aspects. Notably, MindEye achieves top-1 performance in image retrieval tasks, significantly outperforming existing state-of-the-art methods. For image reconstruction, the integration of contrastive learning and diffusion models has proven pivotal, enabling MindEye to reconstruct images with an unprecedented level of detail and semantic accuracy.

Implications

The dual-pipeline architecture of MindEye, combining contrastive learning for retrieval and a diffusion prior for reconstruction, not only sets a new benchmark in fMRI-based image reconstruction but also broadens the scope of neural decoding. The ability of MindEye to accurately map brain activity to high-dimensional image spaces opens new avenues in the paper of cognitive states and processes. Moreover, the flexibility of MindEye to adapt to new generative models offers a scalable solution for future advancements in image generation technologies.

Future Directions

While MindEye represents a significant leap forward, several avenues remain to be explored for further refinement. Expanding the framework to decode and reconstruct mental imagery, along with enabling cross-subject generalization, are critical areas for future research. Additionally, deploying MindEye in real-time clinical or interactive settings presents both a substantial challenge and an opportunity for the broader application of this technology.

Conclusion

MindEye's innovative approach to fMRI-based image retrieval and reconstruction marks a significant milestone in the integration of neuroscience and artificial intelligence. By leveraging contrastive learning and diffusion priors within a dual-pipeline architecture, MindEye achieves state-of-the-art performance, underscoring the potential of AI in decoding and reconstructing the visual experiences reflected in brain activity. As the field advances, MindEye will undoubtedly serve as a foundational model for future explorations into the depths of human cognition and perception.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/_VatsaDev_/status/1853847543878451329

YouTube

Show All Videos