Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs (2306.11536v1)
Abstract: The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at https://sites.google.com/view/stablediffusion-with-brain/. Code is also available at https://github.com/yu-takagi/StableDiffusionReconstruction.
- A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25:116–126, 1 2022.
- Brains and algorithms partially converge in natural language processing. Communications biology, 5(1):1–10, 2022.
- Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023.
- Cinematic mindscapes: High-quality video reconstruction from brain activity. arXiv preprint arXiv:2305.11675, 2023.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Brain captioning: Decoding human brain activity into images and text. arXiv preprint arXiv:2305.11560, 2023.
- Shared computational principles for language processing in humans and deep language models. Nature neuroscience, 25(3):369–380, 2022.
- Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife, 7, 2018.
- Decoding natural image stimuli from fmri data with a surface-based convolutional network. arXiv preprint arXiv:2212.02409, 2022.
- Umut Güçlü and Marcel AJ van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27):10005–10014, 2015.
- Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications, 8(1):1–15, 2017.
- A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644, 2018.
- Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, 116(43):21854–21863, 2019.
- Mental image reconstruction from human brain activity. bioRxiv, pages 2023–01, 2023.
- Cascaded tuning to amplitude modulation for natural sound recognition. Journal of Neuroscience, 39(28):5517–5533, 2019.
- Feature-space selection with banded ridge regression. NeuroImage, page 119728, Nov 2022.
- Human scene-selective areas represent 3d configurations of surfaces. Neuron, 101(1):178–192, 2019.
- Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. arXiv preprint arXiv:2201.12086, 2022.
- Mind reader: Reconstructing complex images from brain activities. arXiv preprint arXiv:2210.01769, 2022.
- Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding from fmri. arXiv preprint arXiv:2302.12971, 2023.
- Minddiffuser: Controlled image reconstruction from human brain activity with semantic and structural diffusion. arXiv preprint arXiv:2303.14139, 2023.
- Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334, 2023.
- Vision transformers for dense prediction. ICCV, 2021.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
- Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49):eabi6070, 2021.
- The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
- Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors. arXiv preprint arXiv:2305.18274, 2023.
- Deep image reconstruction from human brain activity. PLoS Computational Biology, 15, 2019.
- Contrast, attend and diffuse to decode high-resolution images from brain activities. arXiv preprint arXiv:2305.17214, 2023.
- High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023.
- Incorporating natural language into vision models improves prediction and understanding of higher visual cortex.
- Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12):4136–4160, 2018.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23):8619–8624, 2014.
- Controllable mind visual diffusion model. arXiv preprint arXiv:2305.10135, 2023.
- Yu Takagi (11 papers)
- Shinji Nishimoto (9 papers)