NeuralDiffuser: Controllable fMRI Reconstruction with Primary Visual Feature Guided Diffusion (2402.13809v2)

Published 21 Feb 2024 in cs.NE, cs.AI, and cs.CV

Abstract: Reconstructing visual stimuli from functional Magnetic Resonance Imaging (fMRI) based on Latent Diffusion Models (LDM) provides a fine-grained retrieval of the brain. A challenge persists in reconstructing a cohesive alignment of details (such as structure, background, texture, color, etc.). Moreover, LDMs would generate different image results even under the same conditions. For these, we first uncover the neuroscientific perspective of LDM-based methods that is top-down creation based on pre-trained knowledge from massive images but lack of detail-driven bottom-up perception resulting in unfaithful details. We propose NeuralDiffuser which introduces primary visual feature guidance to provide detail cues in the form of gradients, extending the bottom-up process for LDM-based methods to achieve faithful semantics and details. We also developed a novel guidance strategy to ensure the consistency of repeated reconstructions rather than a variety of results. We obtain the state-of-the-art performance of NeuralDiffuser on the Natural Senses Dataset (NSD), which offers more faithful details and consistent results.

References (58)

Authors (3)

Haoyu Li (56 papers)
Hao Wu (623 papers)
Badong Chen (83 papers)

Citations (2)

View on Semantic Scholar

Summary

NeuralDiffuser: Enhancing fMRI Reconstruction through Primary Visual Feature-Guided Diffusion

Introduction

In the domain of neuroscience and brain-computer interface technology, the endeavor to decode and reconstruct visual stimuli from brain activities, notably from functional Magnetic Resonance Imaging (fMRI) signals, stands as a frontier of scientific exploration. The process, however, encounters substantial hurdles, particularly in bridging the modality gap that exists between the multi-dimensional nature of fMRI data and the imaged visual stimuli. Traditional methods often yield reconstructions that are ambiguous and lack cohesion in semantic details. The advent of Latent Diffusion Models (LDMs) has opened new avenues for addressing these challenges, yet they often produce varied outcomes under identical conditions and struggle with the faithful rendition of details.

Methodology

This paper introduces NeuralDiffuser, a method that incorporates primary visual feature guidance within the framework of LDMs to significantly enhance the fidelity of reconstructed images from fMRI data. The NeuralDiffuser architecture aims to meld neuroscientific principles with computational intelligence, drawing insights from both the top-down and bottom-up processes of visual cognition. This approach emphasizes the importance of pretrained knowledge from the top-down perspective, guiding the generative capabilities of the LDM, while also highlighting the necessity of fine-grained, detail-oriented perception from the bottom-up viewpoint for accurate image reconstruction.

fMRI Embeddings Decoding

NeuralDiffuser operates by decoding fMRI embeddings into semantic conditions and initial latents, leveraging neural networks to map these embeddings to visual features that include but are not limited to structure, texture, and color. A novel aspect of this method is its guidance strategy, which ensures consistency in the reconstructed images, a critical factor given the inherent diversity in generative models’ outputs.

Primary Visual Feature Guidance

The cornerstone of NeuralDiffuser lies in its use of primary visual feature guidance, drawn from multiple layers of a CLIP visual encoder. This guidance serves as detail cues for the LDM, significantly improving the semantic fidelity and detail accuracy of the reconstructed images. Employing a well-designed guidance strategy alleviates the common issue of variabilities in generative outcomes, fostering consistency across repeated reconstructions.

Results and Evaluation

NeuralDiffuser demonstrates state-of-the-art performance on the Natural Scenes Dataset (NSD), illustrating a marked improvement in the faithful rendering of details and consistency of results compared to existing methods. The algorithm’s superiority is further substantiated through rigorous evaluation, employing metrics that assess both low-level and high-level image features, including structure similarity, pixel correlation, and semantic relevance.

Implications and Future Directions

The innovation introduced by NeuralDiffuser has significant implications, not only in bridging the gap between neural activations and perceived visual stimuli but also in enhancing the capabilities of brain-computer interfaces. By providing a more accurate and consistent reconstruction of visual experiences from fMRI data, NeuralDiffuser paves the way for advances in how we understand and interact with the intricate workings of the human brain.

Looking forward, the primary visual feature guidance principle could be extended to other reconstruction tasks beyond fMRI, offering a versatile tool for navigating the complex interplay between biological data and computationally generated visual content. The novel guidance strategy developed herein presents a promising avenue for research into controlling generative models, ensuring that the diversity of outcomes does not impede the accuracy and reliability of reconstructions.

PDF Markdown

Tweets

https://twitter.com/jonxuxu/status/1761282124165108195