Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MinD-3D: Reconstruct High-quality 3D objects in Human Brain (2312.07485v3)

Published 12 Dec 2023 in cs.CV

Abstract: In this paper, we introduce Recon3DMind, an innovative task aimed at reconstructing 3D visuals from Functional Magnetic Resonance Imaging (fMRI) signals, marking a significant advancement in the fields of cognitive neuroscience and computer vision. To support this pioneering task, we present the fMRI-Shape dataset, which includes data from 14 participants and features 360-degree videos of 3D objects to enable comprehensive fMRI signal capture across various settings, thereby laying a foundation for future research. Furthermore, we propose MinD-3D, a novel and effective three-stage framework specifically designed to decode the brain's 3D visual information from fMRI signals, demonstrating the feasibility of this challenging task. The framework begins by extracting and aggregating features from fMRI frames through a neuro-fusion encoder, subsequently employs a feature bridge diffusion model to generate visual features, and ultimately recovers the 3D object via a generative transformer decoder. We assess the performance of MinD-3D using a suite of semantic and structural metrics and analyze the correlation between the features extracted by our model and the visual regions of interest (ROIs) in fMRI signals. Our findings indicate that MinD-3D not only reconstructs 3D objects with high semantic relevance and spatial similarity but also significantly enhances our understanding of the human brain's capabilities in processing 3D visual information. Project page at: https://jianxgao.github.io/MinD-3D.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (41)
  1. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116–126, 2022.
  2. Shapenet: An information-rich 3d model repository. arXiv preprint arXiv:1512.03012, 2015.
  3. Maskgit: Masked generative image transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11315–11325, 2022.
  4. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023a.
  5. Cinematic mindscapes: High-quality video reconstruction from brain activity. arXiv preprint arXiv:2305.11675, 2023b.
  6. 3d-aware conditional image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4434–4445, 2023.
  7. Brain encoding and decoding in fmri with bidirectional deep generative models. Engineering, 5(5):948–953, 2019.
  8. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12873–12883, 2021.
  9. fmriprep. Software, 2018.
  10. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nature Methods, 16:111–116, 2019.
  11. Pycortex: an interactive surface visualizer for fmri. Frontiers in neuroinformatics, page 23, 2015.
  12. A multi-modal parcellation of human cerebral cortex. Nature, 536(7615):171–178, 2016.
  13. Scenes in the human brain: Comparing 2d versus 3d representations. Neuron, 101(1):8–10, 2019.
  14. Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife, 7:e32962, 2018.
  15. Stephen Grossberg. 370How We See the World in Depth: From 3D vision to how 2D pictures induce 3D percepts. In Conscious Mind, Resonant Brain: How Each Brain Makes a Mind. Oxford University Press, 2021.
  16. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16000–16009, 2022.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications, 8(1):15037, 2017.
  19. Functional representation of vision within the mind: A visual consciousness model based in 3d default space. Journal of Medical Hypotheses and Ideas, 9(1):45–56, 2015.
  20. Paul Linton. Minimal theory of 3d vision: new approach to visual scale and visual shape. Philosophical Transactions of the Royal Society B, 378(1869):20210455, 2023.
  21. Zero-1-to-3: Zero-shot one image to 3d object, 2023.
  22. Unibrain: Unify image reconstruction and captioning all in one diffusion model from human brain activity. arXiv preprint arXiv:2308.07428, 2023.
  23. Multimodal population brain imaging in the uk biobank prospective epidemiological study. Nature neuroscience, 19(11):1523–1536, 2016.
  24. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334, 2023.
  25. Reconstruction of perceived images from fmri patterns and semantic brain exploration using instance-conditioned gans. In 2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022.
  26. Semantic neural decoding via cross-modal generation. arXiv preprint arXiv:2303.14730, 2023a.
  27. fmri-pte: A large-scale fmri pretrained transformer encoder for multi-subject brain activity decoding. arXiv preprint arXiv:2311.00342, 2023b.
  28. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  29. High-resolution image synthesis with latent diffusion models, 2021.
  30. Zeronvs: Zero-shot 360-degree view synthesis from a single real image. arXiv preprint arXiv:2310.17994, 2023.
  31. Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors. arXiv preprint arXiv:2305.18274, 2023.
  32. End-to-end deep image reconstruction from human brain activity. Frontiers in computational neuroscience, 13:21, 2019a.
  33. Deep image reconstruction from human brain activity. PLoS computational biology, 15(1):e1006633, 2019b.
  34. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  35. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023.
  36. Score jacobian chaining: Lifting pretrained 2d diffusion models for 3d generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12619–12629, 2023.
  37. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12):4136–4160, 2018.
  38. 3d-aware image synthesis via learning structural and textural representations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18430–18439, 2022.
  39. Pushing the limits of 3d shape generation at scale. arXiv preprint arXiv:2306.11510, 2023.
  40. Pointr: Diverse point cloud completion with geometry-aware transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 12498–12507, 2021.
  41. Lion: Latent point diffusion models for 3d shape generation. arXiv preprint arXiv:2210.06978, 2022.
Citations (3)

Summary

  • The paper introduces a novel three-stage MinD-3D framework that accurately reconstructs 3D objects from fMRI signals.
  • It employs an encoder to aggregate fMRI frames, a diffusion model to bridge features, and a generative transformer to decode 3D shapes.
  • Experiments with the new fMRI-Shape dataset demonstrate semantically and structurally coherent 3D reconstructions.

Introduction

Functional Magnetic Resonance Imaging (fMRI) has been instrumental in capturing changes in blood flow related to neural activity in the human brain, allowing researchers to visualize brain activity corresponding to various stimuli. Recent advancements in the field have seen efforts to reconstruct visual experiences from these signals. While preceding efforts were made in reconstructing 2D images, the unique task of reconstructing 3D objects from fMRI data poses a new set of challenges, tapping into the deeper complexities of the human visual system's processing capabilities.

MinD-3D Framework

The paper presents the MinD-3D, a novel three-stage framework specifically designed to decode fMRI signals into accurate 3D object reconstructions. Initially, it involves an encoder inspired by the latest image models to aggregate the nuances of multiple fMRI frames into combined features. These features embody both the semantic content and spatial structure of the visual stimuli.

Subsequently, a feature bridging diffusion model translates the aggregated fMRI features into a visual feature space. This is an innovative use of conditioned diffusion models which historically have been powerful tools for generative tasks. Here, it serves as a pivotal transition from neural representations to visual constructs.

The last segment of the framework employs a generative transformer decoder, an evolved version of a 3D shape generator, to create the actual 3D objects. This innovatively incorporates the prior stages' outputs to fully reconstruct the 3D objects as they are perceived and processed in the human brain.

Dataset and Experimental Design

The backbone of this groundbreaking task is the introduction of the fMRI-Shape dataset. In a meticulous process, subjects were shown 360-degree videos of stationary 3D objects, ensuring comprehensive visual information intake and, consequently, detailed fMRI signal capture. This dataset thus stands as the first pairing of fMRI data with corresponding 3D visuals, paving the way for future research in this promising field.

Conclusions and Contribution

The experiments conducted illuminated the model's significant capacity in creating semantically and structurally coherent 3D representations from fMRI signals. Validated using both semantic and structural metrics, the results consistently reflected the model's high performance, establishing the robustness of MinD-3D and its fidelity to the original objects and the brain's processing patterns. The contribution of this work is threefold – defining a novel task in the intersection of cognitive neuroscience and computer vision, developing an innovative decoding framework, and the pioneering fMRI-Shape dataset. With this, MinD-3D marks a notable advancement in our understanding of the human brain’s 3D visual processing and offers a new platform for further exploration in this interdisciplinary domain.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub

X Twitter Logo Streamline Icon: https://streamlinehq.com