Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs (2306.11536v1)

Published 20 Jun 2023 in q-bio.NC, cs.AI, and cs.CV

Abstract: The integration of deep learning and neuroscience has been advancing rapidly, which has led to improvements in the analysis of brain activity and the understanding of deep learning models from a neuroscientific perspective. The reconstruction of visual experience from human brain activity is an area that has particularly benefited: the use of deep learning models trained on large amounts of natural images has greatly improved its quality, and approaches that combine the diverse information contained in visual experiences have proliferated rapidly in recent years. In this technical paper, by taking advantage of the simple and generic framework that we proposed (Takagi and Nishimoto, CVPR 2023), we examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction. Specifically, we combined our earlier work with the following three techniques: using decoded text from brain activity, nonlinear optimization for structural image reconstruction, and using decoded depth information from brain activity. We confirmed that these techniques contributed to improving accuracy over the baseline. We also discuss what researchers should consider when performing visual reconstruction using deep generative models trained on large datasets. Please check our webpage at https://sites.google.com/view/stablediffusion-with-brain/. Code is also available at https://github.com/yu-takagi/StableDiffusionReconstruction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25:116–126, 1 2022.
  2. Brains and algorithms partially converge in natural language processing. Communications biology, 5(1):1–10, 2022.
  3. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22710–22720, 2023.
  4. Cinematic mindscapes: High-quality video reconstruction from brain activity. arXiv preprint arXiv:2305.11675, 2023.
  5. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  6. Brain captioning: Decoding human brain activity into images and text. arXiv preprint arXiv:2305.11560, 2023.
  7. Shared computational principles for language processing in humans and deep language models. Nature neuroscience, 25(3):369–380, 2022.
  8. Distinct contributions of functional and deep neural network features to representational similarity of scenes in human brain and behavior. Elife, 7, 2018.
  9. Decoding natural image stimuli from fmri data with a surface-based convolutional network. arXiv preprint arXiv:2212.02409, 2022.
  10. Umut Güçlü and Marcel AJ van Gerven. Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. Journal of Neuroscience, 35(27):10005–10014, 2015.
  11. Generic decoding of seen and imagined objects using hierarchical visual features. Nature communications, 8(1):1–15, 2017.
  12. A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron, 98(3):630–644, 2018.
  13. Recurrence is required to capture the representational dynamics of the human visual system. Proceedings of the National Academy of Sciences, 116(43):21854–21863, 2019.
  14. Mental image reconstruction from human brain activity. bioRxiv, pages 2023–01, 2023.
  15. Cascaded tuning to amplitude modulation for natural sound recognition. Journal of Neuroscience, 39(28):5517–5533, 2019.
  16. Feature-space selection with banded ridge regression. NeuroImage, page 119728, Nov 2022.
  17. Human scene-selective areas represent 3d configurations of surfaces. Neuron, 101(1):178–192, 2019.
  18. Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. arXiv preprint arXiv:2201.12086, 2022.
  19. Mind reader: Reconstructing complex images from brain activities. arXiv preprint arXiv:2210.01769, 2022.
  20. Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding from fmri. arXiv preprint arXiv:2302.12971, 2023.
  21. Minddiffuser: Controlled image reconstruction from human brain activity with semantic and structural diffusion. arXiv preprint arXiv:2303.14139, 2023.
  22. Brain-diffuser: Natural scene reconstruction from fmri signals using generative latent diffusion. arXiv preprint arXiv:2303.05334, 2023.
  23. Vision transformers for dense prediction. ICCV, 2021.
  24. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  25. Predicting speech from a cortical hierarchy of event-based time scales. Science Advances, 7(49):eabi6070, 2021.
  26. The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences, 118(45):e2105646118, 2021.
  27. Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors. arXiv preprint arXiv:2305.18274, 2023.
  28. Deep image reconstruction from human brain activity. PLoS Computational Biology, 15, 2019.
  29. Contrast, attend and diffuse to decode high-resolution images from brain activities. arXiv preprint arXiv:2305.17214, 2023.
  30. High-resolution image reconstruction with latent diffusion models from human brain activity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023.
  31. Incorporating natural language into vision models improves prediction and understanding of higher visual cortex.
  32. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral cortex, 28(12):4136–4160, 2018.
  33. Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the national academy of sciences, 111(23):8619–8624, 2014.
  34. Controllable mind visual diffusion model. arXiv preprint arXiv:2305.10135, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Yu Takagi (11 papers)
  2. Shinji Nishimoto (9 papers)
Citations (18)

Summary

Improving Visual Image Reconstruction from Human Brain Activity Using Latent Diffusion Models

The paper presents a paper on enhancing visual image reconstruction from human brain activity through latent diffusion models, with specific improvements achieved by integrating multiple decoded inputs. The researchers propose a comprehensive framework to assess how various decoding techniques influence the reconstruction performance, building upon a prior method by the authors.

Methodology

The paper incorporated three additional techniques to the existing framework: decoded text from brain activity, nonlinear optimization for structural image reconstruction using GANs, and decoded depth information. Each approach aimed to bolster the reconstruction fidelity of visual images from functional magnetic resonance imaging (fMRI) data.

  • Decoded Captions: By predicting semantic latent representations of captions from brain activity and utilizing the BLIP model for caption generation, the paper aims to preserve the semantic essence of the visual experiences. This method leverages Vision Transformer (ViT) features for semantic integrity.
  • Nonlinear Optimization with GANs: Utilizing a GAN-based algorithm, the researchers enhanced visual reconstruction through a nonlinear optimization process. This approach targets the bottleneck layer of the Variational Autoencoder in Stable Diffusion models, focusing on reconstructing low-level latent visual representations from brain signals.
  • Depth Information: The integration of depth data, processed in the human visual cortex, reflects another dimension of visual experiences. Stable Diffusion 2.0 was used here, combining both semantic content and depth to achieve more consistent and accurate reconstructions.

Results

The proposed techniques showed improvement over baseline methodologies, although not uniformly across all subjects and metrics. The paper offers a quantitative evaluation with a two-way identification accuracy table, demonstrating specific gains attributable to each method. The enhancement was particularly notable when utilizing decoded captions and GAN-based optimizations.

Implications and Future Developments

This research opens avenues for refining brain decoding models and enhancing the interaction between deep learning algorithms and neuroscientific data. It suggests potential applications in fields requiring precise brain-computer interfaces, such as neuroprosthetics and virtual reality systems for patient rehabilitation.

Future developments could focus on further optimization of these techniques across a more diverse array of subjects and exploring the integration of additional sensory information within the generative models. Moreover, given the flexibility of the framework, it could incorporate emerging advancements in both neuroscience and AI model architectures.

Conclusion

The paper advances the domain of visual image reconstruction from brain activity by integrating multi-faceted decoding inputs. Its findings suggest a promising trajectory for the use of deep generative models to reveal and interpret complex visual experiences encoded in neural signals, providing a robust foundation for future exploratory research and practical implementations.

Github Logo Streamline Icon: https://streamlinehq.com