MinD-3D: Reconstruct High-quality 3D objects in Human Brain (2312.07485v3)

Published 12 Dec 2023 in cs.CV

Abstract: In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects to enable comprehensive fMRI signal capture across various settings, thereby laying a foundation for future research. Furthermore, we propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals, demonstrating the feasibility of this challenging task. The framework begins by extracting and aggregating features from fMRI frames through a neuro-fusion encoder, subsequently employs a feature bridge diffusion model to generate visual features, and ultimately recovers the 3D object via a generative transformer decoder. We assess the performance of MinD-3D using a suite of semantic and structural metrics and analyze the correlation between the features extracted by our model and the visual regions of interest (ROIs) in fMRI signals. Our findings indicate that MinD-3D not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly enhances our understanding of the human brain's capabilities in processing 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.

References (41)

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a novel three-stage MinD-3D framework that accurately reconstructs 3D objects from fMRI signals.
It employs an encoder to aggregate fMRI frames, a diffusion model to bridge features, and a generative transformer to decode 3D shapes.
Experiments with the new fMRI-Shape dataset demonstrate semantically and structurally coherent 3D reconstructions.

Introduction

Functional Magnetic Resonance Imaging (fMRI) has been instrumental in capturing changes in blood flow related to neural activity in the human brain, allowing researchers to visualize brain activity corresponding to various stimuli. Recent advancements in the field have seen efforts to reconstruct visual experiences from these signals. While preceding efforts were made in reconstructing 2D images, the unique task of reconstructing 3D objects from fMRI data poses a new set of challenges, tapping into the deeper complexities of the human visual system's processing capabilities.

MinD-3D Framework

The paper presents the MinD-3D, a novel three-stage framework specifically designed to decode fMRI signals into accurate 3D object reconstructions. Initially, it involves an encoder inspired by the latest image models to aggregate the nuances of multiple fMRI frames into combined features. These features embody both the semantic content and spatial structure of the visual stimuli.

Subsequently, a feature bridging diffusion model translates the aggregated fMRI features into a visual feature space. This is an innovative use of conditioned diffusion models which historically have been powerful tools for generative tasks. Here, it serves as a pivotal transition from neural representations to visual constructs.

The last segment of the framework employs a generative transformer decoder, an evolved version of a 3D shape generator, to create the actual 3D objects. This innovatively incorporates the prior stages' outputs to fully reconstruct the 3D objects as they are perceived and processed in the human brain.

Dataset and Experimental Design

The backbone of this groundbreaking task is the introduction of the fMRI-Shape dataset. In a meticulous process, subjects were shown 360-degree videos of stationary 3D objects, ensuring comprehensive visual information intake and, consequently, detailed fMRI signal capture. This dataset thus stands as the first pairing of fMRI data with corresponding 3D visuals, paving the way for future research in this promising field.

Conclusions and Contribution

The experiments conducted illuminated the model's significant capacity in creating semantically and structurally coherent 3D representations from fMRI signals. Validated using both semantic and structural metrics, the results consistently reflected the model's high performance, establishing the robustness of MinD-3D and its fidelity to the original objects and the brain's processing patterns. The contribution of this work is threefold – defining a novel task in the intersection of cognitive neuroscience and computer vision, developing an innovative decoding framework, and the pioneering fMRI-Shape dataset. With this, MinD-3D marks a notable advancement in our understanding of the human brain’s 3D visual processing and offers a new platform for further exploration in this interdisciplinary domain.

PDF Markdown

Related Papers

GitHub

MinD-3D

Tweets

https://twitter.com/JianxGao/status/1839319178949570622